DETAILED ACTION

Continued Examination Under 37 CFR 1.114

A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 06/17/2020 has been entered.


Response to Arguments/Remarks

Applicant's amendments and remarks filed on 12/14/2020 have been fully considered but were not found to be persuasive. 
No claims have been amended. Claims 1, 11, 12 are independent claims previously presented and Claims 9, 10, 20, 21 have been previously cancelled.

With respect to Applicant’s argument on page 7, lines 7-28:
“Claim 1, as amended recites: normalizing a dataset including compressed data, wherein normalizing the dataset further comprises: determining a compression technique used to compress the data of the dataset; determining, based on the compression technique, a decompression technique; and decompressing the compressed data of the dataset using the decompression technique; [emphasis added]

The Office Action admits that Singhai does not teach these features and instead points to Raizen. Applicant respectfully disagrees and submits that Raizen does not teach normalizing a dataset as claimed.
In response to Applicant's prior arguments, the Office Action disagrees and asserts that column 12 of Raizen teaches a compression engine operable to perform data compression on a deduplication block device, thereby concluding that data compression is "applied on the deduplicated data block." More specifically, the Office Action relies on the assertion that deduplication is equivalent to the claimed normalization in order to rebut Applicant's arguments about the requirements flowing from the order and language of the claimed features. Applicant respectfully disagrees and submits that the deduplication of Raizen is clearly not normalization as claimed. Consequently, Raizen is at best silent on the order at issue, and the Office Action does not prove otherwise.”

Examiner reads recited paragraph of claim 1 above, along with the Applicant’s arguments, under the light of specification of pending application to clarify “normalizing dataset”, in paragraph [0024] and [FIG.2]:
                
Hallak [0024] “At optional S110, data to be reduced is normalized prior to being globally compressed. The normalization may allow for comparing data that originally existed in different file formats. The data to be normalized may include, for example, newly received data to be compared and reduced with respect to previously stored data. In some embodiments, normalizing the data may include decompressing compressed data. An example normalization process is described further herein below with respect to FIG. 2.”

        
    PNG
    media_image1.png
    603
    795
    media_image1.png
    Greyscale


As Examiner already indicated from the previous office action, Raizen discloses a method of using metadata map 82B to include, 
the type or format of compression algorithm used to compress (e.g., equivalent steps of both S210, S220 in FIG. 2 of Hallak) 
A pointer to the beginning of the compressed chunk, the algorithm used to compress it (e.g., the physical location of the chunk for decompression as needed step S230)
The size of the compressed data.

In another words, Raizen discloses that metadata map 82B have information about the format of compression algorithm or file type extension such as “.gzip”, “.tar”, “.rar”, “zip” or “.gz” and etc. along with the location of the data chuck and size of the chuck.  This implies that Metadata 40B of FIG.4B in Raizen have all of the necessary information to determine compression technique used for the data to compress, such as such as “.gzip”, “.tar”, “.rar”, “zip” or “.gz” and also knows how to decompress it based on the file type extension or data type specified in the metadata. This complete steps S210 and S220 of pending application FIG 2. 
 To be more even more specific, one of ordinary skill in the art would understand how to decompress a data chunk or file with extention “.gzip” using readily available application such as “GNU gzip” or “Winzip” to decompress or may identify the compressed data with a technique used for compress it.
As for the step S230, Raizen teaches that “compressed metadata map 82B includes “a pointer to the beginning of the compressed chunk” so that actual execution of decompression may be done by obtaining the chunk of data from the specified location in the metadata.  Thus, it completes FIG.2 of Hallak.
 (Raizen Col 25, Lines 43-53: “The compression metadata map 82B provides information that includes any one or more of information indicating where a compressed chunk is stored, (i.e., the physical location or offset), the size of the compressed chunk (e.g., its length), and the type or format of compression used to compress the chunk. For example, in one embodiment, the compressed metadata map 82B includes a pointer to the beginning of the compressed chunk, the algorithm used to compress it, and the size of the compressed data (so that the compressed chunk can be decompressed when needed)”)
Accordingly, Raizen clearly teaches a method of claimed normalization such as “determining a compression technique and decompressing the compressed data” and therefore, Examiner respectfully disagree with Applicant.

Regarding Applicant’s argument on page 10:
“Further, Applicant submits that claims 8 and 19 are independently allowable. Representative claim 8 recites: 
wherein each similar block is a reference block selected from a respective set of blocks that are similar to each other, wherein each reference block has a largest size among blocks of the respective set of blocks that are similar to each other. [emphasis added] 
The Office Action points to paragraph 37 of Singhai as allegedly teaching these claim features. Applicant respectfully disagrees. Singhai teaches: 
Furthermore, similarity-based deduplication algorithms operate by deducing an abstract representation of content associated with reference data blocks. Thus, reference data blocks can be used as templates for deduplicating other (i.e., future) incoming data blocks, leading to a reduction in total volume of data being stored. When deduplicated data blocks are recalled from storage, the reduced (e.g., deduplicated) representation can be retrieved from the storage and combined with information supplied by the reference data block(s) to reproduce the original data block. Singhai, para. [0037] (emphasis added). 
Singhai does not even remotely suggest the sizes of the reference blocks being largest relative to the other blocks or, more specifically, that the reference blocks are the largest blocks among blocks that are similar to each other.”
Examiner reads “each reference block has a largest size among blocks of the respective set of blocks that are similar to each other” under the light of specification of pending application, paragraph [0030] along with Applicant’s argument above recited:

Hallak [0030] “At S140, reference blocks are selected from among each set of similar blocks. In an embodiment, one reference block is selected for each set of similar blocks. Each non-selected block is a redundant block with respect to one of the selected reference blocks, i.e., each redundant block is similar to a respective reference block as described above. In an example implementation, each reference block is the first block written to a storage from the group of similar blocks, i.e., a block that was written before other blocks of the set. In another example implementation, each reference block may be the longest block, or the block having the largest size, among the respective set of similar blocks. In other implementations, the reference blocks may be selected based on different criteria.”
Accordingly, claims 8 and 19, Examiner interprets the claim language “each reference block has a largest size among blocks of the respective set of blocks” because the reference blocks may be selected based on different criteria.  In another words, the reference blocks must have a particular type of property, satisfying a criteria to be selected as a reference block, such as “largest”, “smallest”, “oldest” or “newest” and etc.  To be more specific, the method of selecting reference block as a “longest block”, or “the block having the largest size” are a means of grouping similar blocks among many others in order to manage them more effectively.
Similarly, Singhai discloses a method of identifying an association between the retrieved data blocks and one or more reference data sets stored in the data store, wherein the association reflects a common dependency of the retrieved data blocks to the one or more reference data sets. 
For instance, a common dependency of data block may be a key characteristics to be a reference among many other data blocks, such as “largest” or “smallest”, newest, oldest or etc.
(Singhai [0006] In general, another innovative aspect of the subject matter described in this disclosure may be implemented in methods that include: retrieving data blocks from a data store; identifying an association between the retrieved data blocks and one or more reference data sets stored in the data store, wherein the association reflects a common dependency of the retrieved data blocks to the one or more reference data sets; generating a segment including the data blocks that depend on the common reference data set; generating a first identifier for the segment; and tracking the segment using the first identifier.
Singhai [0043] In some embodiments, depending upon resource constraints of the system, data blocks of reference data set can be customized to include a predefined number of data blocks in the reference data set as well as a maximum number of reference data sets. In further embodiments, the system can comprise a clustered system, in which multiple different reference data sets are shared across the cluster to get a wider coverage.)


Claim Rejections - 35 USC §103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-8, 11-19 are rejected under 35 U.S.C. 103 as being unpatentable over Singhai et al. Pub. No.: US 2017/0123677 A1, hereinafter Singhai in view of Raizen et al., Patent No.: US 8156306 B1, hereinafter Raizen.

As per claim 1,  (Previously Amended) A method for global data compression, comprising: normalizing a dataset including compressed data, wherein normalizing the dataset further comprises: determininq a compression technique used to compress the data of the dataset; determining, based on the compression technique, a decompression technique; and decompressing the compressed data of the dataset using the decompression technique;  
(Singhai does not explicitly disclose including the information about compress/decompress technique being used for each of data block.
However, Raizen discloses a method of using metadata map 82B to include, the type or format of compression algorithm used to compress: Raizen Col 25, Lines 43-53: “The compression metadata map 82B provides information that includes any one or more of information indicating where a compressed chunk is stored, (i.e., the physical location or offset), the size of the compressed chunk (e.g., its length), and the type or format of compression used to compress the chunk. For example, in one embodiment, the compressed metadata map 82B includes a pointer to the beginning of the compressed chunk, the algorithm used to compress it, and the size of the compressed data (so that the compressed chunk can be decompressed when needed)”)

splitting [[a]] the normalized dataset into a plurality of blocks; for each block of the plurality of blocks: 
(Singhai teaches a method of determining similar data blocks with matching engine 308, which uses similarity-based algorithm to detect the resemblance hashes: par. [0123], lines 1-9: “In some embodiments, the signature fingerprint computation engine 306 and/or matching engine 308 can user a similarity-based algorithm to detect resemblance hashes (e.g. sketches) which have the property that similar data blocks and reference data sets have similar resemblance hashes (e.g. sketches)”)

computing at least one similarity hash for the block; determining, based on the at least one similarity hash, whether a similar block is found for the block, 
(Singhai teaches a method of determining similar data blocks with matching engine 308, which uses similarity-based algorithm to detect the resemblance hashes: par. [0123], lines 1-9: “In some embodiments, the signature fingerprint computation engine 306 and/or matching engine 308 can user a similarity-based algorithm to detect resemblance hashes (e.g. sketches) which have the property that similar data blocks and reference data sets have similar resemblance hashes (e.g. sketches)”)

wherein a similar block for a block has a similarity hash that is similar to one of the computed at least one similarity hash for the block; compressing the block by replacing data of the block with a reference to the similar block 
(Singhai discloses a method of discovering resemblance hashes of the data blocks (0-7), followed by performing deduplication and self-compression on the corresponding data blocks: par. [0197],  lines 6-15: “The encoding engine 310 may perform block level deduplication that includes comparing resemblance hashes and/or digital sig natures/fingerprints of the data blocks (0-7) to stored resemblance hashes of corresponding reference data set 1604 as illustrated in FIG. 16. If similar-based resemblance hashes exist between data blocks of the data set 1602 and the reference data set 1604, the encoding engine 310 may then encode the corresponding data blocks associated with the similar-based resemblance hashes, as depicted in FIG. 16. The encoding engine 310 may perform deduplication and self-compression on the corresponding data blocks associated with the similar-based resemblance hashes”)

and a delta when a similar block is found, wherein the delta is a difference in data between the block and the similar block; and compressing the block independently when a similar block is not found.  
(Singhai discloses the use of delta-encoding algorithms to identify the only changed portion of data set in the similar data blocks and encodes (e.g., compress) a delta: par. [0135], lines 17-21: “In another embodiment, if the new set of data blocks are similar to an existing reference data set, the encoding engine 310 may store a delta showing the difference between the reference data set from which the new set of data blocks are encoded.”  par. [0196], lines 1-4: The encoding engine 310 can be performed by a delta-encoding algorithm. Delta encoding algorithms identify similar resemblance hashes between data blocks and a reference data set and stores only the changed data.”)

As per claim 2, (Original) The method of claim 1, further comprising; storing each compressed block, wherein each independently compressed block is stored with metadata, wherein the metadata includes a compression algorithm used to compress the data. 
(Singhai does not explicitly disclose a method of metadata for storing the compressed block information.
However, Raizen discloses compressed metadata map, which includes pointer to the beginning of the compressed chunk, the compression algorithm used and the size of the compressed data: (Raizen, col. 25, lines 49-53:  “For example, in one embodiment, the compressed metadata map 82B includes a pointer to the beginning of the compressed chunk, the algorithm used to compress it, and the size of the compressed data (so that the compressed chunk can be decompressed when needed).”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Raizen into the system of Singhai because, they are analogous art as being directed to the same field of endeavor, the systems and methods to use deduplication, compression and data reduction techniques. 
Raizen’s method for storing property information of compressed chunk in the metadata would have been motivated to use in Singhai’s Data Reduction Unit 210 (See FIG. 3B) to improve maintaining data blocks by storing the information about property of data block in the metadata, as an example, <meta compression = ‘tar’>, <meta size =‘57kb’> <meta hash = ‘4b57’> and etc.

As per claim 3, (Original) The method of claim 2, wherein the metadata further includes a reference count, further comprising, for each stored block; determining, based on the reference count for the stored block, whether to delete at least one reference to the stored block, wherein it is determined to delete the at least one reference when the stored block is not being used; and deleting the at least one reference to one of the stored blocks when it is determined to delete the at least one reference. (Singhai teaches a method of using reference count associated with reference data blocks by tracking the number of times data blocks rely on a reference data block: par. [0039], lines 1-8: “A method for retiring old reference data blocks that are no longer useful needs to be applied. The method may include a reference count associated with reference data blocks by tracking the number of times data blocks rely on a reference data block and/or set of reference data blocks such that it can be determined when a reference data block is no longer relied upon by a data block and can therefrom be retired from the set.”)
  
As per claim 4, (Original) The method of claim 1, wherein the dataset is split using variable-sized chunking. 
(Singhai teaches does not explicitly discloses using variable-sized data chunk.  
However, Raizen discloses the implementation of thin provisional layer providing variable length chuck sizes: (Raizen, col. 15, lines 32-41: “Variable length chunk sizes can, in some instances, result in more complicated mappings and remap pings than with fixed chunks; however, variable length chunk sizes are more readily implemented if the space reclamation related layers 28, 31, 32, and/or optionally 35, and the thin provisioning layer 54, are all provided/implemented entirely within a storage appliance 14, as is illustrated in FIG. 1C. In at least some embodiments, furthermore, the thin provisioning layer 54 can even provide data storage extents of a variable size.”)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine Raizen into the system of Singhai to improve utilization of storage space where storage spaces are limited resources. As an example, since variable length of chuck are more flexible to be fit in available spaces over fixed size chunk, the variable sized chuck may have an advantage of minimizing unused storage spaces in the system.

As per claim 5, (Original) The method of claim 1, wherein each similar block is a reference block selected from a respective set of blocks that are similar to each other. 
(Singhai discloses a method of identifying similar data blocks by using a similarity-based algorithm to detect resemblance hashes: par. [0123] lines 1-6: In some embodiments, the signature fingerprint computation engine 306 and/or matching engine 308 can user a similarity-based algorithm to detect resemblance hashes (e.g. sketches) which have the property that similar data blocks and reference data sets have similar resemblance hashes (e.g. sketches).

As per claim 6, (Original) The method of claim 5, further comprising; storing, in an index, the similarity hash computed for each of the plurality of blocks, wherein whether a similar block is found is determined based on the indexed similarity hashes. 
(Singhai discloses that resemblance hashes of reference data set and/or segments of reference data (e.g., index) sets may be stored in data store: par. [0131] lines8-13: “For instance, the matching engine 308 may compare resemblance hashes of one or more reference data sets and/or segments of reference data sets stored in a data store such as, data storage repository 110, to resemblance hashes associated with the new set of data blocks.”)

As per claim 7, (Original) The method of claim 5, wherein each reference block was received before each other block of the respective set of blocks that are similar to each other. 
(Singhai teaches the steps for (1) receiving data blocks, (2) identifying whether similarity already exists, (3) determine similarity and finally, (4) encode (compress) blocks by associating with the reference data block: “Figure 6A, element 602: receives data block, element 606: identify whether similarity exists, element 608: determine similarity, element 610: encode data block, (e.g., compress), element 612: update the records table)”)

Regarding claim 8, (Original) The method of claim 5, wherein each reference block has a largest size among blocks of the respective set of blocks that are similar to each other.
(Singhai discloses a method of using reference data blocks as templates for deduplicating other (i.e., future) incoming data blocks, leading to a reduction in total volume of data being stored: par. [0037] “Furthermore, similarity-based deduplication algorithms operate by deducing an abstract representation of content associated with reference data blocks. Thus, reference data blocks can be used as templates for deduplicating other (i.e., future) incoming data blocks, leading to a reduction in total volume of data being stored. When deduplicated data blocks are recalled from storage, the reduced (e.g., deduplicated) representation can be retrieved from the storage and combined with information supplied by the reference data block(s) to reproduce the original data block.”)

As per claim 9, (Cancelled)  
As per claim 10, (Cancelled)
  
As per claim 11,  (Previously Amended) A non-transitory computer readable medium having stored thereon instructions for causing a processing circuitry to execute a process, the process comprising:  normalizinq a dataset includinq compressed data, wherein normalizinq the dataset further comprises: determining a compression technique used to compress the data of the dataset; determininq, based on the compression technique, a decompression technique; and decompressing the compressed data of the dataset using the decompression technique; splitting a dataset into a plurality of blocks; for each block of the plurality of blocks: computing at least one similarity hash for the block; determining, based on the at least one similarity hash, whether a similar block is found for the block, wherein a similar block for a block has a similarity hash that is similar to one of the computed at least one similarity hash for the block; compressing the block by replacing data of the block with a reference to the similar block and a delta when a similar block is found, wherein the delta is a difference in data between the block and the similar block; and compressing the block independently when a similar block is not found.  

Claims 11 is analogous to claim 1 and is rejected under the same rationale as indicated above

As per claim 12, (Previously Amended) A system for global data compression, comprising: a processing circuitry; and a memory, the memory containing instructions that, when executed by the processing circuitry, configure the system to: normalize a dataset includinq compressed data, wherein the system is further configured to: determine a compression technique used to compress the data of the dataset; determine, based on the compression technique, a decompression technique; and decompress the compressed data of the dataset using the decompression technique;  split a dataset into a plurality of blocks; for each block of the plurality of blocks: compute at least one similarity hash for the block; determine, based on the at least one similarity hash, whether a similar block is found for the block, wherein a similar block for a block has a similarity hash that is similar to one of the computed at least one similarity hash for the block; compress the block by replacing data of the block with a reference to the similar block and a delta when a similar block is found, wherein the delta is a difference in data between the block and the similar block; and compress the block independently when a similar block is not found.  

Claims 12 is analogous to claim 1 and is rejected under the same rationale as indicated above

As per claim 13, (Original) The method of claim 12, further comprising: storing each compressed block, wherein each independently compressed block is stored with metadata, wherein the metadata includes a compression algorithm used to compress the data. 

Claims 13 is analogous to claim 2 and is rejected under the same rationale as indicated above
 
As per claim 14,  (Original) The method of claim 13, wherein the metadata further includes a reference count, further comprising, for each stored block: determining, based on the reference count for the stored block, whether to delete at least one reference to the stored block, wherein it is determined to delete the at least one reference when the stored block is not being used; and deleting the at least one reference to one of the stored blocks when it is determined to delete the at least one reference.  

Claims 14 is analogous to claim 3 and is rejected under the same rationale as indicated above.

As per claim 15, (Original) The method of claim 12, wherein the dataset is split using variable-sized chunking.  

Claims 15 is analogous to claim 4 and is rejected under the same rationale as indicated above.

As per claim 16, (Original) The method of claim 12, wherein each similar block is a reference block selected from a respective set of blocks that are similar to each other.  

Claims 16 is analogous to claim 5 and is rejected under the same rationale as indicated above.

As per claim 17, (Original) The method of claim 16, further comprising: storing, in an index, the similarity hash computed for each of the plurality of blocks, wherein whether a similar block is found is determined based on the indexed similarity hashes.  

Claims 17 is analogous to claim 6 and is rejected under the same rationale as indicated above.

As per claim 18,  (Original) The method of claim 16, wherein each reference block was received before each other block of the respective set of blocks that are similar to each other.  

Claims 18 is analogous to claim 7 and is rejected under the same rationale as indicated above.

As per claim 19,   (Original) The method of claim 16, wherein each reference block has a largest size among blocks of the respective set of blocks that are similar to each other. 

Claims 19 is analogous to claim 8 and is rejected under the same rationale as indicated above.
 
As per claim 20 (Cancelled)  
As per claim 21 (Cancelled)  


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHONGSUH PARK whose e-mail is chongsuh.park@uspto.gov or telephone number is (408) 918-7574. The examiner can normally be reached on Monday - Friday 8:00-5:30 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571)272-3978 EST.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CHONGSUH PARK/Examiner, Art Unit 2154  

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154