DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The action is responsive to the communication filed on 3/8/2022. Claims 1-2 and 10-11 have been amended. Claims 1-18 are pending in this office action, of which claims 1 and 10 are independent claims.

Response to Arguments
Applicant’s arguments, see page 6, filed 3/8/2022, with respect to the rejection(s) of claim(s) 1 and 10 under 35 USC 112 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  
Applicant’s arguments, see pages 6-9, filed 3/8/2022, with respect to the rejection(s) of claim(s) 1-18 under 35 USC 102 have been fully considered but are not persuasive.  

Examiner respectfully disagrees with all of the allegations as argued.  Examiner, in her previous office action, gave a detailed explanation of claimed limitation and pointed out exact locations in the cited prior art. 
Examiner is entitled to give claim limitations their broadest reasonable interpretation in light of the specification.  See MPEP 2111 [R-1]
	Interpretation of Claims-Broadest Reasonable Interpretation
	During patent examination, the pending claims must be ‘given the broadest reasonable interpretation consistent with the specification.’  Applicant always has the opportunity to amend the claims during prosecution and broad interpretation by the examiner reduces the possibility that the claim, once issued, will be interpreted more broadly than is justified. In re Prater, 162 USPQ 541,550-51 (CCPA 1969).
 
Applicant argues:
a.	There is no disclosure of "collaborating by the extension engine and the deduplication engine, to identify third files from the second files that have not been deduplicated by the deduplication engine based on a global catalog accessible to the deduplication engine". (pages 7-8). 
In response to applicant's argument a:  The argument is that Tofano does not disclose both a local catalog for the extension engine at the edge node and a global catalog for the deduplication engine at the datacenter.

Applicant’s argument is not persuasive. In addition to the global catalog, Tofano also teaches local indexes as described in para 0004 and 0124 that Accordingly, some implementations herein include multiple local deduplication indexes (i.e., distributed index portions), with a different or secondary key that can be used to find an entry, and which may provide multiple lookup paths to a single index item. 

With regards to the limitation in the argument, Tofano teaches in para 0110 with reference to Fig. 5 that each shard 502 may be divided into as many as 16 slices 504. All slices 504 defined for the same shard 502 exist on the same service computing device 102. Because slices 504 are an on-node segmentation of the deduplication index 314 that is internal and transparent, no remote routing is needed to access a slice 504. Once a request is present on a service computing device 102 (whether generated locally or received from another service computing device 102), a portion of the data-portion identifier bytes may be used for routing by the index API 508 for selecting the proper slice 504. Accordingly, the slice routing range may be a sub-portion of the shard routing range, i.e., a first portion of the data-portion identifier may indicate the shard range and a second portion of the data-portion identifier may indicate the slice range. In addition, each slice 504 may be supported by one or more files that constitute the slice. Slices may be split across storage devices on the storage 108 for improved performance, but the use of the stripes 506 generally makes this unnecessary. See also para 0120. 
In view of the above, the examiner contends that all limitations as recited in the claims have been addressed in this Action.  For the above reasons, Examiner believed that rejection of the last Office action was proper.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-18 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Tofano, US 20200057752 A1 (hereinafter “Tofano).

As to claims 1 and 10,
Tofano teaches a method for collaboratively deduplicating data (Tofano, para 0083, the deduplication components 126 executing on individual nodes may depend on a coordinator, such as a coordination service 324, to ensure data integrity at all times.  Thus, the deduplication components 126 may be a client of the coordination service 324.  As one example, the coordination service 324 may be a centralized service, such as a DLM, or the like, that functions as one of the parts of the overall coordination services), the method comprising: 
receiving data from an edge device (Tofano, Fig. 1, element 114) at an extension engine operating on an edge node (Tofano, Fig. 1 element 102 correspond to edge node, para 0079, as in Fig. 3 the deduplication components 126 may execute on each service computing device 102 configured for deduplicating data.  In this example, suppose that the deduplication components 126 receive incoming file data 308.  The parser 302 divides the incoming file data into a plurality of deduplication data portions 310 with para 0055 for In the illustrated example, the cluster 106 and storage 108 are configured to act as a data storage system 150 for the client devices 114.  The service application 122 on each service computing device 102 may be executed to receive and store data from the client devices 114 and/or subsequently retrieve and provide the data to the client devices 114); 
checking the data using a local catalog to determine which files in the data have been previously transmitted to a deduplication engine operating in a datacenter (Tafano,para 0073-0074 for incoming stream of data is broken into multiple deduplication data portions, each incoming deduplication data portion may be compared to the existing deduplication data portions already in the system to identify duplicates. The deduplication processing herein may employ an efficient data portion ID and related indexing methodology to efficiently search for matches.), 
wherein the local catalog includes metadata configured to determine that first files in the data have been previously sent to the deduplication engine and that second files in the data have not been sent to the deduplication engine based on the local catalog (Tafano, para 0074-0075, as new deduplication data portions arrive in the system, each deduplication data portion may be looked up by data-portion identifier via a lookup call.  If the data-portion identifier is not present in the index, the data-portion identifier for the deduplication data portion may be added via an add call, along with corresponding referential data.  The referential data may at least contain at least a pointer to the existing duplicate instance of the deduplication data portion along with a reference count indicating how many instances of data refer to the particular deduplication data portion.  If the data-portion identifier is present, the index returns the corresponding referential data so that metadata to the shared deduplication data portion may be persisted with para 0075 for when a data portion is unique is it stored in full along with the necessary metadata updates.  When a data portion is a duplicate, the storage system avoids writing the data, but instead tracks references to the existing duplicate data portion, typically by metadata updates.  Further, some deduplication systems may include delta data portions, which may be former duplicate data portions that have been updated, and that include referential metadata that points to overlay changes that can be used to build the updated versions of the data portions when desired with para 0043 for some or all of the service computing devices 102 may maintain the index API information 127 and index components 130.  In some examples, the global deduplication index may be divided into multiple index shards, and the index shards may be distributed as index components 130 (i.e., local catalog) across some or all of the service computing devices 102 based on one or more configuration rules.); 
collaborating, by the extension engine and the deduplication engine (Tofano, para 0120, the deduplication system includes a plurality of lookup components, of which the global deduplication index is one), to identify third files from the second files that have not been deduplicated by the deduplication engine based on a global catalog accessible to the deduplication engine (Tofano, para 0110 and 0120, In addition, in some examples, the internal block structure of the storage may include a table of contents (TOC), which is a per-block lookup structure that may be maintained in memory. The TOC may also be loosely temporally ordered by file. Further, the TOC is outside of the global deduplication index proper, but may be used as a fourth layer of lookup that allows lookup in some cases without accessing the global deduplication index. This also means that global deduplication index lookups that resolve to blocks with TOC's can be directed to a computing device that may have these blocks cached in memory, thereby avoiding unnecessary access to the storage 108. Accordingly, the deduplication system includes a plurality of lookup components, of which the global deduplication index is one. However, implementations herein also may enable alternative local cache lookup-schemes to avoid unnecessary use of global index. With para 0077 for if the source system can learn what data is already on the target system, the source system can transfer far less data and avoid huge transfer costs. This may be accomplished with a deduplication negotiation protocol); 
transmitting the third files to the deduplication engine (Tofano, para 0094, In the illustrated example, the classifier 304(1)-304(4) on each service computing device 102(1)-102(4), respectively, may use an index API 508 to route requests across the shards 502, slices 504, and stripes 506 located on the respective service computing devices 102, essentially uniting the plurality of distributed index components into a coherent global deduplication index 314); 
deduplicating, by the deduplication engine, the third files (Tofano, para 0090, when a file including a duplicate data portion shared with another file is updated, the duplicate data portion may be split in two, with a resulting unique version and the shared version. The global deduplication index 314 and referential data may also be updated to reflect the data portion split); and 
updating the local catalog such that the local catalog reflects that the third files have been deduplicated by the deduplication engine (Tofano, para 0094 for the classifier 304(1)-304(4) on each service computing device 102(1)-102(4), respectively, may use an index API 508 to route requests across the shards 502, slices 504, and stripes 506 located on the respective service computing devices 102, essentially uniting the plurality of distributed index components into a coherent global deduplication index 314).  
As to claims 2 and 11, 
Tofano teaches wherein the global catalog associates data from the source with hashes of deduplicated files (Tofano, para 0076 and 0078 for The deduplication processing herein may be inline deduplication or post-process deduplication.  For instance, inline deduplication may be performed as the data is received and before the data is stored to the storage 108.  On the other hand, post-process deduplication may be performed after the data has been received and placed into what is typically a temporary storage location.  Inline deduplication may reduce the performance requirements for the storage and, accordingly, implementations herein are described in the environment of inline deduplication, but are not limited to such, and may be similarly applied to post-process deduplication, such as post-process hash-based deduplication or byte-differential deduplication with para 0019 for some or all of the computing devices in the cluster may have a local deduplication index that is managed by the respective computing device, and the local deduplication indexes on the respective computing devices in the cluster may collectively comprise the overall global deduplication index for the deduplication system).  
As to claims 3 and 12,
Tofano teaches generating a list of the second files and transmitting the list to the deduplication engine (Tofano, para 0079 for as in Fig. 3, the deduplication components 126 include a parser 302, a classifier 304, and a persistence engine 306. The deduplication components 126 may execute on each service computing device 102 configured for deduplicating data. In this example, suppose that the deduplication components 126 receive incoming file data 308. The parser 302 divides the incoming file data into a plurality of deduplication data portions 310, which may be of a fixed size or may be of a variable size with para 0058 for The data storage system 150 may be configured to perform deduplication of data, such as for any data received from a client device 114. Further, the data storage system 150 may be configured to perform deduplication of data that is replicated or otherwise transferred to another storage system, storage location, or the like. As mentioned above, the deduplication components 126 may be executed to perform deduplication and, during the deduplication, may access a global deduplication index maintained by the storage system 150).  
As to claims 4 and 13,
Tofano teaches determining the third files from the list and the global catalog (Tofano, para 0073 for Since comparing full data portions is expensive, most deduplication technologies generate a "fingerprint" or other data-portion identifier for each deduplication data portion that is far smaller than the actual full data portion of bytes, but which, at least in part, represents the content of the respective deduplication data portion. Schemes for calculating data portion fingerprints or other data-portion identifiers can vary significantly. For example, the data-portion identifiers may be generated using a hashing algorithm, data portion content/stream location information, or by other techniques. Generating of the deduplication index may be referred to herein as deduplication indexing, and the process of calculating, storing, and matching data-portion identifiers may be referred to herein as "deduplication data portion classification”).  
As to claims 5 and 14,
Tofano teaches instructing the extension engine to transmit the third files to the deduplication engine (Tofano, para 0022 and 0082, the deduplication index may be a global index and may be used for both inline deduplication processing and post-process deduplication processing with para 0082 for the persistence engine 306 may include a metadata handling service to create and store referential data as metadata 320 which points to the matching data portions already stored in the storage 108. The persistence engine 306 also creates metadata 320 for the unique data portions 318. Further, the persistence engine 306 (i.e., extension engine and deduplication engine) may add the data-portion identifiers 312 for the unique data portions 318 to the global deduplication index 314).  
As to claims 6 and 15,
Tofano teaches deduplicating the third files by chunking the files, comparing hashes of the chunks with hashes stored in the global catalog, and storing new chunks in storage of the cloud (Tofano, para 0079 and 0073 for Since comparing full data portions is expensive, most deduplication technologies generate a "fingerprint" or other data-portion identifier for each deduplication data portion that is far smaller than the actual full data portion of bytes, but which, at least in part, represents the content of the respective deduplication data portion. Schemes for calculating data portion fingerprints or other data-portion identifiers can vary significantly. For example, the data-portion identifiers may be generated using a hashing algorithm, data portion content/stream location information, or by other techniques. Generating of the deduplication index may be referred to herein as deduplication indexing, and the process of calculating, storing, and matching data-portion identifiers may be referred to herein as "deduplication data portion classification”).    
As to claim 7,
Tofano teaches checking the data using a local catalog includes deduplicating based on chunks having a larger size than chunks used by the deduplication engine (Tafano, para 0091 for FIG. 5 illustrates an example 500 of the storage system 150 including a distributed deduplication index according to some implementations. Conventionally, an index may become too large to store in a single memory. For instance, if the storage system 150 stores one petabyte or more of data, and if each deduplication data portion is one megabyte in size (which is larger than a typical deduplication data portion), then there are billions of potential data portions, and if most of the data portions are unique, the deduplication index may have billions of entries).
As to claims 8 and 17,
Tofano teaches the deduplication engine receives a list from multiple extension mechanisms at multiple edge nodes and each extension mechanism identifies third files, further comprising deduplicating all of the third files (Tofano, para 0079 for as in Fig. 3, the deduplication components 126 include a parser 302, a classifier 304, and a persistence engine 306. The deduplication components 126 may execute on each service computing device 102 configured for deduplicating data. In this example, suppose that the deduplication components 126 receive incoming file data 308. The parser 302 divides the incoming file data into a plurality of deduplication data portions 310 (i.e., the third file), which may be of a fixed size or may be of a variable size with para 0058 for The data storage system 150 may be configured to perform deduplication of data, such as for any data received from a client device 114. Further, the data storage system 150 may be configured to perform deduplication of data that is replicated or otherwise transferred to another storage system, storage location, or the like. As mentioned above, the deduplication components 126 may be executed to perform deduplication and, during the deduplication, may access a global deduplication index maintained by the storage system 150).  
[AltContent: rect]As to claims 9 and 18,
Tofano teaches updating each of the extension engines based on their corresponding lists (Tofano, para 0094, the classifier 304(1)-304(4) on each service computing device 102(1)-102(4), respectively, may use an index API 508 to route requests across the shards 502, slices 504, and stripes 506 located on the respective service computing devices 102, essentially uniting the plurality of distributed index components into a coherent global deduplication index 314. For instance, each deduplication index shard 502 on a service computing device 102 that is organized into slices 504 and stripes 506 may be arranged to have at least two classification layers).    
As to claim 16,
Tofano teaches providing the defuplication engine with pointers to the first files and the second files that are not transmitted to the deduplicating engine (Tofano, para 0129 and 0133 for The index log 802 may be the final location destination of all index items 602 (index entries including a data-portion identifier 604 and corresponding reference information 618) that exist in the index shard 502. Accordingly, the sum of the index logs 802 on each service computing device 102 may be the primary persistence for all data-portion identifiers and related referential data).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
The reference Esam et al. (US 9882985 B1) discloses Systems, methods, and articles of manufacture comprising processor-readable storage media for data storage path processing. For example, in one method, a data block is received from a first device over a communications network, wherein the data block is specified to be sent to a second device located on the communications network. A distributed data storage system is accessed to store the data block in a first datastore associated with the first device, and to store a copy of the data block in a second datastore associated with the second device. A notification message is sent to the second device over the communications network to notify the second device that the data block is stored in the second datastore. The method may be performed by an application server that is implemented in an IoT (Internet of Things) cloud computing system.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 


Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to NARGIS SULTANA whose telephone number is (571)272-6350. The examiner can normally be reached Monday to Thursday 8:30am to 4:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashish Thomas can be reached on 571 272 0631. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



6/9/2022

/NARGIS SULTANA/Examiner, Art Unit 2164           

/ASHISH THOMAS/Supervisory Patent Examiner, Art Unit 2164