DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4-6, 8-9, 11-13, 15-16 and 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaraman et al. US patent application 2013/0018851 [herein “Jayaraman”], and further in view of Lillibridge et al. US patent application 2010/0223441 [herein “Lillibridge”].
Claim 1 recites “A method of deduplicating a file, the file comprising a first chunk, the method comprising: determining if a hash of a content of the first chunk is in a cache of a chunk hash data structure, wherein the chunk hash data structure comprises a first plurality of key-value mappings between a first plurality of keys and first plurality of values, the first plurality of keys each being a hash of a content of a corresponding chunk, and the first plurality of values each being a chunk ID of the corresponding chunk; and”.

Jayaraman uses dictionaries (i.e., chunk hash data structure) to maintain data chunk identifier (i.e., hash) and location (i.e., chunk ID) pairings (i.e., key-value mappings) in a deduplication system [0017]. A hash algorithm is used to generate an identifier for each chunk [0020]. Dictionary is cached in memory to reduce disk access [0025].
Claim 1 further recites “if the hash is not in the cache: determining a chunk ID of the first chunk; determining a plurality of chunk IDs that are subsequent and contiguous to the chunk ID;”
Jayaraman’s dictionary is cached in memory [0025]. If a chunk identifier (i.e., hash) is not found in the cache, then it is looked up in dictionary and fetched to get (i.e., determine) its location (i.e., chunk ID). Dictionary entries are also prefetched for chunks in datastores following (i.e., subsequent and contiguous) the fetched chunk [0017].
Claim 1 further recites “for each chunk ID of the plurality of chunk IDs, obtaining a set of information corresponding to the chunk ID of the plurality of chunk IDs based on a chunk ID data structure, wherein the chunk ID data structure comprises a second plurality of key-value mappings between a second plurality of keys and a second plurality of values, the second plurality of keys being the chunk IDs of the chunk hash data structure, and the second plurality of values each being a corresponding set of information about the corresponding chunk, wherein each set of information comprises the hash of the content of the corresponding chunk;”

Jayaraman does not disclose this limitation; however, Lillibridge stores chunks in a container, which is implemented as a file or object (Lillibridge: [0011]). Metadata (i.e., set of information) about chunks in a container, e.g., chunk hash and chunk location (i.e., chunk ID), can be stored in a separate metadata file (i.e., chunk ID data structure) logically associated with the container file (Lillibridge: [0038]). Chunk location is used to identify its container, which is in turn used to locate its metadata.
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Jayaraman with Lillibridge. One having ordinary skill in the art would have found motivation to utilize Lillibridge to prefetch neighboring chunk hashes to improve Jayaraman’s cache hit ratio.
Claim 1 further recites “generating first plurality of key-value mappings for each of the plurality of chunk IDs using the obtained sets of information; and storing the generated first plurality of key-value mappings as entries in the cache.”
In Jayaraman, given a chunk identifier (i.e., hash) not found in the cache, it is looked up in dictionary and fetched. Dictionary entries are also prefetched for chunks in datastores following (i.e., subsequent and contiguous) the fetched chunk [0017].
Claims 8 and 15 are analogous to claim 1, and are similarly rejected.

Claim 2 recites “The method of claim 1, wherein the determining the chunk ID of the first chunk is performed by accessing the chunk hash data structure.”
Jayaraman’s dictionary is cached in memory [0025]. If a chunk identifier (i.e., hash) is not found in the cache, then it is looked up in dictionary and fetched to get its location (i.e., chunk ID) [0017].
Claims 9 and 16 are analogous to claim 2, and are similarly rejected.

Claim 4 recites “The method of claim 1, further comprising, if the hash is not in the cache: determining if the hash is in the chunk hash data structure; and if the hash is in the chunk hash data structure: determining a first set of information of a second chunk by mapping the chunk ID to the first set of information using the chunk ID data structure, the first set of information including a pointer to content of the second chunk; and modifying a pointer in a first file corresponding to the first chunk to point to the content of the second chunk.”
Jayaraman’s dictionary is cached in memory [0025]. If a chunk identifier (i.e., hash) is not found in the cache, then it is looked up in dictionary and fetched to get its location (i.e., chunk ID) [0017]. A request to store a chunk (i.e., first chunk) is intercepted to determine if it is already deduplicated – its fingerprint (i.e., hash) is found in cache or in dictionary for another chunk (i.e., second chunk). If so, the associated metadata (i.e., first set of information), e.g., reference to the first chunk, is updated to point to the second chunk, without actually storing the first chunk again [0033].
Claims 11 and 18 are analogous to claim 4, and are similarly rejected.

Claim 5 recites “The method of claim 1, further comprising, if the hash is in the cache: determining, from the cache, the chunk ID of the first chunk; reading, from the chunk ID data structure, a first set of information of a second chunk by mapping the chunk ID to the first set of information using the chunk ID data structure, the first set of information including a pointer to content of the second chunk; and modifying a pointer in a first file corresponding to the first chunk to point to the content of the second chunk.”
Jayaraman’s dictionary is cached in memory [0025]. If a chunk identifier (i.e., hash) is found in the cache, then its associated location (i.e., chunk ID) is obtained. A request to store a chunk (i.e., first chunk) is intercepted to determine if it is already deduplicated – its fingerprint (i.e., hash) is found in cache or in dictionary for another chunk (i.e., second chunk). If so, the associated metadata (i.e., first set of information), e.g., reference to the first chunk, is updated to point to the second chunk, without actually storing the first chunk again [0033].
Claims 12 and 19 are analogous to claim 5, and are similarly rejected.

Claim 6 recites “The method of claim 1, wherein each set of information comprises the corresponding hash of the corresponding chunk, and at least one of: (a) a pointer to the content of the corresponding chunk, or (b) a reference count of the corresponding chunk.”
Jayaraman teaches claim 1, but does not disclose this claim; however, Lillibridge stores chunks in a container, which is implemented as a file or object (Lillibridge: [0011]). Metadata (i.e., set of information) about chunks in a container, e.g., chunk hash and chunk location (i.e., pointer), can be stored in a separate metadata file (i.e., chunk ID data structure) logically associated with the container file (Lillibridge: [0038]).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Jayaraman with Lillibridge. One having ordinary skill in the art would have found motivation to utilize Lillibridge’s metadata file to capture pointers to content of the corresponding chunks in Jayaraman, to avoid storing duplicate chunks redundantly.
Claims 13 and 20 are analogous to claim 6, and are similarly rejected.

Claims 3, 7, 10, 14 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Jayaraman as applied to claim 1 above, in view of Lillibridge, and further in view of Sorting a HashMap according to keys in Java. https://www.geeksforgeeks.org, 2018, pp. 1-7 [herein “SortHashMap”].
Claim 3 recites “The method of claim 1, wherein at least a portion of the sets of information of the chunk ID data structure are stored in order of corresponding chunk IDs in a storage block of a storage, wherein the storage block is cached in a second cache, and wherein obtaining the sets of information corresponding to the chunk IDs of the plurality of chunk IDs comprises obtaining the portion of the sets of information from the second cache.”
Jayaraman and Lillibridge teach claim 1, where Lillibridge maps chunk location (i.e., chunk ID) to chunk metadata (i.e., set of information) via container, and stores the mapping (i.e., chunk ID data structure) in a metadata file (i.e., storage block) (Lillibridge: [0038]). Jayaraman and Lillibridge do not disclose this claim; however, SortHashMap sorts a key-value mapping by keys (i.e., chunk IDs) (SortHashMap: pp. 1/7).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Jayaraman and Lillibridge with SortHashMap. One having ordinary skill in the art would have found motivation to sort Lillibridge’s mapping by keys, and to store it in Jayaraman’s local memory (i.e., second cache), to enhance access performance.
Claims 10 and 17 are analogous to claim 3, and are similarly rejected.

Claim 7 recites “The method of claim 1, wherein the chunk hash data structure is stored in storage with entries in order of the first plurality of keys.”
Jayaraman and Lillibridge teach claim 1, where Jayaraman’s dictionary maps chunk hash to chunk location (i.e., chunk hash data structure) [0017], and is stored in a cache or local memory [0025]. Jayaraman and Lillibridge do not disclose this claim; however, SortHashMap sorts a key-value mapping by keys (i.e., chunk hash) (SortHashMap: pp. 1/7).
Therefore, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine Jayaraman and Lillibridge with SortHashMap. One having ordinary skill in the art would have found Lillibridge’s mapping by keys, and to store it in Jayaraman’s local memory, to enhance access performance.
Claim 14 is analogous to claim 7, and is similarly rejected.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHELLY X. QIAN whose telephone number is (408)918-7599.  The examiner can normally be reached on Monday - Friday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-






/SHELLY X QIAN/Examiner, Art Unit 2163                                                                                                                                                                                                        


/ALEX GOFMAN/Primary Examiner, Art Unit 2163