Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION

1.	This action is responsive to the communication filed on 8/22/2022.  Claims 1-6, 8-16 and 18-20 have been amended. Claims 7 and 17 have been cancelled. Claims 21-22 have been added. Claims 1-6, 8-16 and 18-22 are pending.
2.	Applicants' arguments filed 8/22/2022 have been fully considered but they are not deemed to be persuasive.  Rejections and/or objections not reiterated from previous office actions are hereby withdrawn.  The following rejections and/or objections are either reiterated or newly applied.  They constitute the complete set presently being applied to the instant application.

Claim Rejections - 35 USC § 103
3.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
4.	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
5.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

6.	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
7.	Claims 1, 3-4, 13, 15 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Passey in view of Auchmoody et al (U.S. Patent 8825971 B1 hereinafter, “Auchmoody”).
8.	With respect to claim 1,
	Passey discloses a method comprising:
generating, by a system comprising a hardware processor, a database for a set of entities (Passey [0089] e.g. files) associated with a plurality of sampled data units (blocks - Passey [0089] e.g. [0089] The facility chooses the fraction of samples taken from files in each directory of the file system tree in such a way as to achieve a sample population that is appropriately distributed across the file system tree. In some embodiments, the facility determines the overall number of samples taken to satisfy a desired confidence level in the results) of a storage system, wherein the set of entities comprises a parent entity (e.g. folder/directory - Passey [0065] e.g. [0065] For example, if an attribute of a file (e.g., file_size) is updated in response to changes made to that file, then the updated file_size attribute will also need to be reflected in the metric corresponding to that file_size attribute in the parent directory and in any ancestor directory of that file. To indicate when the updated attribute is not yet reflected in the corresponding metric of the parent directory, a value of the metric unreconciled in the parent directory indicates a different value than the metric reconciled to parent directory. Each attribute of the file for which a metric value is stored in a parent directory can maintain these unreconciled to parent and reconciled to parent values. When these values differ, it indicates to the facility that the parent directory of that file needs to be updated by the difference between the values during a subsequent I/O operation), and at least one of a snapshot of the parent entity (Passey [0044] e.g. [0044] For example, a process for de-duplicating files and folders uses the checksum aggregates to determine whether two or more folders contain (or are likely to contain) identical files rather than separately scanning each folder. As another example, a replication system (such as a cloud backup storage system), the checksum aggregates allow for quicker identification of changes between a folder and a previously-stored version (i.e., snapshot) of that folder or between two or more snapshots [as snapshots of the parent entity (e.g. folder)]), a clone of the parent entity, a snapshot of the clone of the parent entity, or a clone of the snapshot of the parent entity, and wherein the database comprises one or more data structures;
in response to an input/output (I/O) operation, accessing, by the system, a first data structure in the database that maps (map, bloom filter - Passey [0040], [0059] e.g. [0040] Accordingly, the address abstraction layer can include a data storage map 312 that links the referenced location to a particular node and associated disc (e.g., see FIG. 2)on which the requested data is stored. [0059] In some embodiments, for file system objects that are directories, the directory data 512 contains a directory map data structure identifying each file system object contained by the directory by its name and inode number. For file system objects that are files, the file data 512 contains an extent list data structure identifying the disc blocks containing the file contents by the file system addresses usable to locate those blocks. In some embodiments, the facility also maintains a global data structure for mapping inode numbers from each inode in the file system tree to the file system location at which the corresponding inode data is stored. In some embodiments, the facility may store a copy of this global data structure on each node (e.g., in FIG. 2)), in a bit vector of the first data structure (Passey [0045] – [0049] e.g. [0046] In some embodiments, the facility maintains a bloom filter aggregate for one or more attributes. A bloom filter is a probabilistic data structure used to determine whether a particular element is likely to belong to a particular set. While a bloom filter can produce false positives, it does not produce false negatives. For example, a bloom filter can be used to determine whether a particular folder may contain any files having a particular attribute, such as a particular owner, creator, mtime, and so on. If a user is searching for all files owned by USER1, a computing system can apply a bloom filter configured to determine whether a particular folder may contain or does not contain any files owned by any particular user. In some embodiments, a bloom filter is represented as a bit string and generated by applying multiple hash functions to an attribute value for a particular file or folder, each hash function generating a placement in the bit string, and setting corresponding bits in the bit string to 1), an entity (e.g. file) in the set of entities (e.g. files) with respective sampled signatures (e.g. checksum, hash) of a set of sampled signatures (e.g. checksum, hash).
Although Passey substantially teaches the claimed invention, Passey does not explicitly indicate 
wherein each respective entry of the bit vector indicates whether the entity refers to a respective sampled signature of the set of sampled signatures, and wherein corresponding sampled signatures of the set of sampled signatures are computed based on applying a function on corresponding sampled data units of the plurality of sampled data units;
comparing, by the system, the bit vector of the first data structure with a prior version of the bit vector; and
controlling, by the system, a backup of the entity based on the comparing.
Auchmoody teaches the limitations by stating wherein each respective entry of the bit vector indicates whether the entity refers to a respective sampled signature of the set of sampled signatures, and wherein corresponding sampled signatures of the set of sampled signatures are computed based on applying a function on corresponding sampled data units of the plurality of sampled data units;
comparing, by the system, the bit vector of the first data structure with a prior version of the bit vector; and
controlling, by the system, a backup of the entity based on the comparing (Auchmoody col. 2 line 61 – col. 3 lines 38, col. 7 line 21 – col. 8 line 63, and Figs. 6C-D e.g. [col. 2 line 61 – col. 3 line 38] (13) Embodiments of the invention relate to methods and systems for generating de-duplicated backup data sets for archiving in a CAS-based storage system or other storage system. To initiate generation of a backup data set, a backup client ages out and selects a root tag vector entry j for re-use. The new root tag vector entry j will include a root hash representative of the new backup data set when it is complete. [col. 7 line 21 – col. 8 line 63] (41) One embodiment of the filename cache 620 is illustrated in FIG. 6C. The filename cache 620 includes a plurality of entries, each entry corresponding to a different file. Each entry in the filename cache 620 may include a hash 622 of the metadata of the file (including, for example, the path, modify time, size, attributes, and other metadata for the file), a hash 624 of the contents of the file, an age(not shown) or last access time of the file represented by each hash 622, 624, a size (not shown) of the file represented by each hash 622, 624, and a tag field 626 indicating which root hashes protect the file. (42) More specifically, the tag field 626 for each entry includes 1 to N tag bits that identify one or more root hashes (e.g., R1, R2, . . . RN) the file is protected by. Each tag bit Ti corresponds to the root tag vector entry Ri. As already mentioned above, the existence of a root hash implicates the existence of all the data (including files) and composites beneath the root hash somewhere in the CAS archive 144. For instance, in the illustrated embodiment, the existence of root hash 1 (R1) implies that the files represented by the hash of File 1 and the hash of File Y have previously been backed up to and are stored in the CAS archive 144. (43) One embodiment of the hash cache 630 is illustrated in FIG. 6D. Similar to the filename cache620, the hash cache 630 includes a plurality of entries, each entry corresponding to a different composite data or chunk of data. Each entry in the hash cache 630 includes a hash 632 of the data (e.g., a piece of data or a composite data), the age or last access time 634 of the data, the size 636 of the data, and a tag field 638 indicating which root hashes protect the data. The "age" or "last access time" of the data refers to the last time the data was backed up (e.g., the last time the hash of the data was included anywhere beneath a root hash). (48) For instance, if the response from the backup server indicates that root hash 2 (R2) no longer exists on the backup server, the tag bit T2 would be zeroed out for each hash in the filename cache and hash cache [as
wherein each respective entry (e.g. entry) of the bit vector (e.g. Fig. 6C-D) indicates whether (e.g. existence – 1 or 0) the entity (e.g. file/chunk) refers (e.g. the existence of root hash 1 (R1) implies that the files represented by the hash of File 1 and the hash of File Y have previously been backed up to and are stored in the CAS archive 144; referring to the instant applicant’s specification [0033]) to a respective sampled signature (e.g. hash) of the set of sampled signatures (e.g. hashes), and wherein corresponding sampled signatures of the set of sampled signatures are computed based on applying a function on corresponding sampled data units (e.g. file/chunk) of the plurality of sampled data units (e.g. files/chunks);
comparing (e.g. a backup client ages out and selects a root tag vector entry j for re-use. The new root tag vector entry j will include a root hash representative of the new backup data set when it is complete – comparing root tag vector entry with different access time in Fig. 6C to age out root tag vector entry for backup; referring to the instant applicant’s specification [0091]), by the system, the bit vector of the first data structure with a prior version (e.g. earlier root tag vector entry) of the bit vector; and
controlling, by the system, a backup (e.g. a backup client ages out and selects a root tag vector entry j for re-use. The new root tag vector entry j will include a root hash representative of the new backup data set when it is complete) of the entity based on the comparing
]. Thus, a nonzero value for any bit in the tag mask indicates that a particular hash in the cache is protected by a particular root hash that has been validated by the backup server. Hashes in the filename and/or hash caches that are not protected by any root hashes, as indicated by the tag mask, can be deleted from the filename and/or hash caches to make room for new hashes.).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of Passey and Auchmoody, to solve the disadvantage that when new hash values are encountered, the client may add the new hash values to the cache to aid in the identification of redundant data for future backup data sets. If not controlled, however, the cache can grow to an unmanageable size (Auchmoody col. 2 lines 5-19).
9.	With respect to claim 3,
	Auchmoody further discloses
detecting, by the system, a generation of a new entity in the storage system (Auchmoody col. 2 line 61 – col. 3 lines 38, col. 7 line 21 – col. 8 line 63, and Figs. 6C-D e.g. new backup data set); and
updating, by the system, the first data structure with a further bit vector mapping the new entity to respective sampled signatures of the set of sampled signatures (Auchmoody col. 2 line 61 – col. 3 lines 38, col. 7 line 21 – col. 8 line 63, and Figs. 6C-D e.g. The new root tag vector entry j will include a root hash representative of the new backup data set when it is complete).
10.	With respect to claim 4,
	Passey further discloses wherein the parent entity is one of a file, a filesystem (Passey [0068] and Fig. 4 e.g. filesystem tree in Fig. 4), or a volume.
11.	Claim 13 and 15 are same as claims 1 and 3 and are rejected for the same reasons as applied hereinabove.
12.	Claim 19 is same as claim 1 and is rejected for the same reasons as applied hereinabove.

13.	Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Passey in view of Auchmoody, and further in view of MENEZES et al (U.S. Patent 10303662 B1 hereinafter, “MENEZES”).
14.	With respect to claim 11,
Although Passey and Auchmoody combination substantially teaches the claimed invention, they do not explicitly indicate storing, by the system, bit vectors in the first data structure in compressed format.
MENEZES teaches the limitations by stating storing, by the system, bit vectors in the first data structure in compressed format (MENEZES claim 1 e.g. 1. A method, comprising: measuring an amount of physical storage space used, or expected to be used, by a dataset that comprises an ad-hoc group of size ‘n’ of files F.sub.1. . . F.sub.n, wherein measuring the amount of physical storage space comprises: receiving information that identifies the ad-hoc group of size ‘n’ of files F.sub.1. . . F.sub.n, each file F including a respective segment set S; creating a bloom filter with bitmap B.sub.F for each of the files F, and sampling a representation of each unique segment in the segment set S by sampling a fingerprint of each unique segment in the segment set S, wherein each sampled fingerprint corresponds to a unique segment physical size, and adding the unique segment physical sizes to the corresponding bloom filter to obtain a sampled unique segment count for each file F; compressing and caching each of the bloom filters, wherein the bloom filters are compressed such that the respective compressed sizes are proportional to the size of an associated file; obtaining a unique segment count for each file F by applying a sampling ratio R to each sampled unique segment count; determining an average segment size for each file F; generating a physical space measurement for each file F based on the average segment size and the unique segment count; and generating a total physical space measurement p based on the individual physical space measurements for each file F by combining the compressed and cached bloom filters).
Therefore, it would have been obvious to one of ordinary skill in the art at the time of the effective filing date of the invention, in view of the teachings of Passey, Auchmoody and MENEZES, to efficiently measure the physical storage space consumed by an ad-hoc subset of files in a data protection system (MENEZES col. 2 lines 4-12). 

Allowable Subject Matter
15.	Claims 2, 5-6, 8-10, 12, 14, 16, 18, 20-22 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
16.	Applicant’s remarks and arguments presented on 8/22/2022 have been fully considered but they are moot in view of the new grounds of rejection presented in this office action.

Conclusion
17.	Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SyLing Yen whose telephone number is 571-270-1306.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Mark Featherstone can be reached at 571-270-3750.  The fax and phone numbers for the organization where this application or proceeding is assigned is 571-273-8300.
Any inquiry of a general nature or relating to the status of this application or proceeding should be directed to the receptionist whose telephone number is 571-272-2100. 

66



/SYLING YEN/Primary Examiner, Art Unit 2166                                                                                                                                                                                                        
September 9, 2022