DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4 and 9-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Johnston et al. (US Patent No. 9152333 B1, hereinafter Johnston).

Regarding Claim 1, Johnston discloses an apparatus comprising: at least one processing device comprising a processor coupled to a memory ([Col. 4, lines 29-31]: FIG. 2 is a block diagram of one embodiment of a node 102, that includes multiple processors 202A and 202B, a memory 204); the at least one processing device being configured to perform steps of: 
collecting, from a plurality of storage systems, data patterns for data stored in the plurality of storage systems (Fig. 2, fingerprint data store 207; [Abstract]: The method includes steps of selecting randomly a plurality of data blocks from a data set as a sample of the data set, collecting fingerprints of the plurality of data blocks of the sample [the fingerprints correspond to the data patterns]. See also [Col. 1, lines 64-66], [Col. 2, lines 36-39], [Col. 5, lines 54-57]); 
clustering the plurality of storage systems into one or more data pattern sharing clusters based at least in part on the collected data patterns, a given one of the one or more data pattern sharing clusters comprising two or more of the plurality of storage systems ([Col. 2, lines 65-67]: FIG. 1 is a schematic block diagram showing a plurality of storage system nodes interconnected as a storage cluster for servicing data requests; [Col. 4, lines 58-59]: Ethernet may be used as the clustering protocol and interconnect media. See also [Col. 2, lines 65-67], [Col. 3, lines 50-52], [Col. 7, lines 13-32]); 
identifying, for the given data pattern sharing cluster, a subset of the collected data patterns (Fig. 4A; [Col. 7, lines 21-32]: the storage system determines a sampling portion of the data set at step 415A… The size of the sampling portion can be a parameter (as a percentage number) supplied in the request, a predetermined percentage value, or a percentage value calculated by examining the data pattern of a small portion of the data set. See also [Col. 2, lines 13-16], [Col. 6, lines 54-67]); and 
providing, to the two or more storage systems of the given data pattern sharing cluster, the identified subset of the data patterns (Fig. 4A; [Col. 7, lines 33-59]: At step 420A, the deduplication potential estimator of the storage system retrieves a fingerprint of a block in the sampling portion of the data set… If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store… If the block fingerprint is in the fingerprint data store, at step 435A the deduplication potential estimator increments the duplicate counter number of that fingerprint by one in the fingerprint data store), wherein the identified subset of the collected data patterns are utilized by the two or more storage systems in performing data deduplication ([Col. 6, lines 64-66]: The fingerprints uniquely identify the data stored in the data blocks, and therefore are used for data deduplication purposes).

Regarding Claim 2, Johnston discloses the apparatus of claim 1 wherein the two or more storage systems implement inline pattern detection for performing data deduplication ([Col. 6, lines 11-16]: The storage operating system 206, at least a portion of which is typically resident in the memory of the node 102 invokes operations in support of the storage service implemented by the node 102. For instance, the operations can include data deduplication process or deduplication potential estimation process [The storage system implementing ILPD detects the predefined data patterns in memory]), 
the inline pattern detection utilizing the identified subset of the collected data patterns (Fig. 2; [Col. 5, lines 54-57]:The memory 204 can store a fingerprint data store 207; ([Col. 6, lines 64-66]: The fingerprints uniquely identify the data stored in the data blocks, and therefore are used for data deduplication purposes)).

Regarding Claim 3, Johnston discloses the apparatus of claim 2 wherein the inline pattern detection of a given one of the two or more storage systems utilizes a set of predefined data patterns (Fig. 4A, fingerprint data store (425A)), the identified subset of the collected data patterns comprising at least one data pattern not in the set of predefined data patterns (Fig. 4A; [Col. lines]: If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store and sets a duplicate counter number of that fingerprint to a value of one (1)).

Regarding Claim 4, Johnston discloses the apparatus of claim 1 wherein collecting the data patterns comprises collecting, from each of the plurality of storage systems, a designated number of most frequently occurring data patterns for data stored in that storage system ([Col. 7, lines 42-48]: The fingerprint data store records block fingerprints as well as numbers of duplicates (also referred to as duplicate counter number or frequency). Each unique fingerprint has a corresponding duplicate counter number stored in the fingerprint data store. A duplicate counter number of a unique fingerprint is the number of blocks in the sampling portion which have that unique fingerprint. See also [Col. 6, lines 66-67]-[Col. 7, lines 1-7]).

Regarding Claim 9, Johnston discloses the apparatus of claim 1 wherein collecting the data patterns comprises generating a first data structure with entries denoting a frequency at which each of the collected data patterns is observed on each of the plurality of storage systems over a given time period (Fig. 5; [Col. 10, lines 1-8]: In one embodiment, at step 505 the deduplication potential estimator determines a cut-off value for separating the entries in the fingerprint data store into a higher frequency section and a lower frequency section. The higher frequency section includes fingerprints having duplicate counter numbers larger than the cut-off value. The lower frequency section includes fingerprints having duplicate counter numbers less than or equal to the cut-off value).

Regarding Claim 10, Johnston discloses the apparatus of claim 9 wherein clustering the plurality of storage systems takes as input the first data structure ([Col. 4, lines 34-351]: The local storage 213 comprises one or more physical storage devices) and produces a second data structure that tags the entries of the first data structure for each of the plurality of storage system with labels corresponding to ones of the one or more data pattern sharing clusters to which the plurality of storage systems belong ([Col. 4, lines 38-41]: The local storage 213 can also be utilized by the node to locally store configuration information (e.g., in a configuration data structure 214)).

Regarding Claim 11, Johnston discloses the apparatus of claim 10 wherein identifying the subset of the collected data patterns for the given data pattern sharing cluster comprises sorting the collected data patterns based at least in part on mean frequency of occurrence across the two or more storage systems in the given data pattern sharing cluster (Fig. 4B; [Col. 8, lines 21-25]: After receiving the fingerprints, at step 425B, the deduplication potential estimator sorts the received fingerprints by an order. For example, the deduplication potential estimator can sort the received fingerprints by an order of the numerical values of the fingerprints) and 
selecting a designated number of the collected data patterns having a highest mean frequency of occurrence across the two or more storage systems in the given data pattern sharing cluster as the subset of the collected data patterns for the given data pattern sharing cluster (Figs 4A, 4B; [Col. 8, lines 8-11]: After the sorting, duplicate fingerprints are grouped together in the sorted list of fingerprints. At step 430B, the deduplication potential estimator iterates through the sorted list of fingerprints to generate the duplicate counter numbers for all unique fingerprints from the sampling portion… [Col. 9, lines 6-10]: In other words, for fingerprints having higher frequencies, S sets of unique fingerprints having a counter of C in sampling portion will be extrapolated as S sets of unique fingerprints having a counter of C/p, wherein p is the sampling percentage).

Regarding Claim 12, Johnston discloses the apparatus of claim 1 wherein identifying the subset of the collected data patterns for the given data pattern sharing cluster is based at least in part on frequencies of occurrence of the collected data patterns in each of the two or more storage systems of the given data pattern sharing cluster ([Col. 8, lines 61-65]: when estimating deduplication potential of a data set based on the duplication information of a sampling portion of the data set, different approaches can be taken for larger and smaller counter numbers (also referred to as higher and lower frequencies. See also [Col. 9, lines 6-10]).

Regarding Claim 13, Johnston discloses the apparatus of claim 1 wherein the at least one processing device is part of a monitoring and analytics platform external to the plurality of storage systems ([Col. 4, lines 13-16]: FIG. 2 is a high-level block diagram showing an example of the architecture of a node, which can represent any of the storage cluster nodes; [Col. 4; lines 42-47]: Application 115 can be, for example, a database application, a financial management system, an electronic mail application or any other application type. See also [Col. 3, lines 28-32], [Col. 4; lines 42-47]).

Regarding Claim 14, Johnston discloses the apparatus of claim 13 wherein the monitoring and analytics platform comprises a cloud-based monitoring and analytics platform (Fig. 1; [Col. 4; lines 42-47]: Application 115 can be, for example, a database application, a financial management system, an electronic mail application or any other application type; [Col. 5, lines 1-5]: The network adapter 210 can further comprise one or more ports adapted to couple the node 102 to one or more clients 114 over point-to-point links, wide area networks, virtual private networks implemented over a public network (e.g., Internet) or a shared local area network).

Regarding Claim 15, Johnston discloses a computer program product comprising a non-transitory processor-readable storage medium having stored therein program code of one or more software programs (Fig. 1; [Col. 13 lines 6-9]: Software or firmware for use in implementing the techniques introduced here may be stored on a machine-readable storage medium and may be executed by one or more general-purpose or special-purpose programmable microprocessors), wherein the program code when executed by at least one processing device causes the at least one processing device to perform steps of: 
collecting, from a plurality of storage systems, data patterns for data stored in the plurality of storage systems (Fig. 2, fingerprint data store 207; [Abstract]: The method includes steps of selecting randomly a plurality of data blocks from a data set as a sample of the data set, collecting fingerprints of the plurality of data blocks of the sample [the fingerprints correspond to the data patterns]. See also [Col. 1, lines 64-66], [Col. 2, lines 36-39], [Col. 5, lines 54-57]); 
clustering the plurality of storage systems into one or more data pattern sharing clusters based at least in part on the collected data patterns, a given one of the one or more data pattern sharing clusters comprising two or more of the plurality of storage systems ([Col. 2, lines 65-67]: FIG. 1 is a schematic block diagram showing a plurality of storage system nodes interconnected as a storage cluster for servicing data requests; [Col. 4, lines 58-59]: Ethernet may be used as the clustering protocol and interconnect media. See also [Col. 2, lines 65-67], [Col. 3, lines 50-52], [Col. 7, lines 13-32]); 
identifying, for the given data pattern sharing cluster, a subset of the collected data patterns (Fig. 4A; [Col. 7, lines 21-32]: the storage system determines a sampling portion of the data set at step 415A… The size of the sampling portion can be a parameter (as a percentage number) supplied in the request, a predetermined percentage value, or a percentage value calculated by examining the data pattern of a small portion of the data set. See also [Col. 2, lines 13-16], [Col. 6, lines 54-67]); and 
providing, to the two or more storage systems of the given data pattern sharing cluster, the identified subset of the data patterns (Fig. 4A; [Col. 7, lines 33-59]: At step 420A, the deduplication potential estimator of the storage system retrieves a fingerprint of a block in the sampling portion of the data set… If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store… If the block fingerprint is in the fingerprint data store, at step 435A the deduplication potential estimator increments the duplicate counter number of that fingerprint by one in the fingerprint data store), wherein the identified subset of the collected data patterns are utilized by the two or more storage systems in performing data deduplication ([Col. 6, lines 64-66]: The fingerprints uniquely identify the data stored in the data blocks, and therefore are used for data deduplication purposes).

Regarding Claim 16, Johnston discloses the computer program product of claim 15 wherein the two or more storage systems implement inline pattern detection for performing data deduplication, the inline pattern detection utilizing the identified subset of the collected data patterns ([Col. 6, lines 11-16]: The storage operating system 206, at least a portion of which is typically resident in the memory of the node 102 invokes operations in support of the storage service implemented by the node 102. For instance, the operations can include data deduplication process or deduplication potential estimation process [The storage system implementing ILPD detects the predefined data patterns in memory]).

Regarding Claim 17, Johnston discloses the computer program product of claim 16 wherein the inline pattern detection of a given one of the two or more storage systems utilizes a set of predefined data patterns (Fig. 4A, fingerprint data store (425A)), the identified subset of the collected data patterns comprising at least one data pattern not in the set of predefined data patterns (Fig. 4A; [Col. lines]: If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store and sets a duplicate counter number of that fingerprint to a value of one (1)).

Regarding Claim 18, Johnston discloses a method comprising: 
collecting, from a plurality of storage systems, data patterns for data stored in the plurality of storage systems (Fig. 2, fingerprint data store 207; [Abstract]: The method includes steps of selecting randomly a plurality of data blocks from a data set as a sample of the data set, collecting fingerprints of the plurality of data blocks of the sample [the fingerprints correspond to the data patterns]. See also [Col. 1, lines 64-66], [Col. 2, lines 36-39], [Col. 5, lines 54-57]); 
clustering the plurality of storage systems into one or more data pattern sharing clusters based at least in part on the collected data patterns, a given one of the one or more data pattern sharing clusters comprising two or more of the plurality of storage systems ([Col. 2, lines 65-67]: FIG. 1 is a schematic block diagram showing a plurality of storage system nodes interconnected as a storage cluster for servicing data requests; [Col. 4, lines 58-59]: Ethernet may be used as the clustering protocol and interconnect media. See also [Col. 2, lines 65-67], [Col. 3, lines 50-52], [Col. 7, lines 13-32]); 
identifying, for the given data pattern sharing cluster, a subset of the collected data patterns (Fig. 4A; [Col. 7, lines 21-32]: the storage system determines a sampling portion of the data set at step 415A… The size of the sampling portion can be a parameter (as a percentage number) supplied in the request, a predetermined percentage value, or a percentage value calculated by examining the data pattern of a small portion of the data set. See also [Col. 2, lines 13-16], [Col. 6, lines 54-67]); and 
providing, to the two or more storage systems of the given data pattern sharing cluster, the identified subset of the data patterns (Fig. 4A; [Col. 7, lines 33-59]: At step 420A, the deduplication potential estimator of the storage system retrieves a fingerprint of a block in the sampling portion of the data set… If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store… If the block fingerprint is in the fingerprint data store, at step 435A the deduplication potential estimator increments the duplicate counter number of that fingerprint by one in the fingerprint data store), wherein the identified subset of the collected data patterns are utilized by the two or more storage systems in performing data deduplication ([Col. 6, lines 64-66]: The fingerprints uniquely identify the data stored in the data blocks, and therefore are used for data deduplication purposes);
wherein the method is performed by at least one processing device comprising a processor coupled to a memory ([Col. 3, lines 63-64]: The main memory may be coupled to the CPU via a system bus or a local memory bus).

Regarding Claim 19, Johnston discloses the method of claim 18 wherein the two or more storage systems implement inline pattern detection for performing data deduplication ([Col. 6, lines 11-16]: The storage operating system 206, at least a portion of which is typically resident in the memory of the node 102 invokes operations in support of the storage service implemented by the node 102. For instance, the operations can include data deduplication process or deduplication potential estimation process [The storage system implementing ILPD detects the predefined data patterns in memory]), the inline pattern detection utilizing the identified subset of the collected data patterns (Fig. 2; [Col. 5, lines 54-57]:The memory 204 can store a fingerprint data store 207; ([Col. 6, lines 64-66]: The fingerprints uniquely identify the data stored in the data blocks, and therefore are used for data deduplication purposes).

Regarding Claim 20, Johnston discloses the method of claim 19 wherein the inline pattern detection of a given one of the two or more storage systems utilizes a set of predefined data patterns (Fig. 4A, fingerprint data store (425A)), the identified subset of the collected data patterns comprising at least one data pattern not in the set of predefined data patterns (Fig. 4A; [Col. lines]: If the block fingerprint is not in the fingerprint data store, at step 430A the deduplication potential estimator adds the block fingerprint into the fingerprint data store and sets a duplicate counter number of that fingerprint to a value of one (1)).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5-8 are rejected under 35 U.S.C. 103 as being unpatentable over Johnston et al. (US Patent No. 9152333 B1, hereinafter Johnston) in further view of Kim et al. (US Patent No. 11062228 B2, hereinafter Kim).

Regarding Claim 5, Johnston discloses the apparatus of claim 1.
However, Johnston does not explicitly teach “wherein clustering the plurality of storage systems into the one or more data pattern sharing clusters comprises utilizing a mean-shift clustering algorithm.”
On the other hand, in the same field of endeavor, Kim teaches wherein clustering the plurality of storage systems into the one or more data pattern sharing clusters comprises utilizing a mean-shift clustering algorithm ([Col. 6, lines 28-32]: The clusters may represent abstracted or generalized labels and may be generated using calculations or algorithms, such as the k-means clustering, spectral clustering, affinity propagation, mean-shift).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the apparatus of Johnston to incorporate the teachings of Kim to include “wherein clustering the plurality of storage systems into the one or more data pattern sharing clusters comprises utilizing a mean-shift clustering algorithm.”
The motivation for doing so would be to reduce the number of labels, as recognized by Kim ([Col. 6, lines 25-28] of Kim: For example, given a domain and the labels that occur in that domain, label mapping component 208 may reduce the number of labels by clustering the vector representations).


Regarding Claim 6, the combined teachings of Johnston and Kim disclose the apparatus of claim 5.
 Kim further teaches wherein the mean-shift clustering algorithm utilizes multidimensional scaling to achieve dimensionality reduction for the collected data patterns ([Abstract]: The embedded labels may be represented by multi-dimensional vectors that correspond to particular labels; [Col. 6, lines 25-31]: For example, given a domain and the labels that occur in that domain, label mapping component 208 may reduce the number of labels by clustering the vector representations. The clusters…  may be generated using calculations or algorithms, such as… mean-shift).

Regarding Claim 7, the combined teachings of Johnston and Kim disclose the apparatus of claim 6.
 Kim further teaches wherein the multidimensional scaling takes as input a first data structure with entries characterizing a frequency of observation of each of the collected data patterns on each of the plurality of storage systems and provides as output a second data structure that projects the frequency of observation of each of the collected data patterns from a first dimension to a second dimension lower than the first dimension ([Col. 5, lines 43-45]: The vectorization component 206 may be configured to transform the query data within data store 204 into low-dimensional vector representations. See also [Col. 5, line 55 – Col. 6, line 10]: TABLE-US-00001 CCA-Label, Input).

Regarding Claim 8, the combined teachings of Johnston and Kim disclose the apparatus of claim 6.
 Kim further teaches wherein the mean-shift clustering algorithm produces a data structure that tags ones of the plurality of storage systems with labels corresponding to ones of the one or more data pattern sharing clusters to which the plurality of storage systems belong ([Col. 1, lines 39-45]: The data set may comprise labels and word sets associated with the labels. The server device may induce label embedding within the data set. The embedded labels may be represented by multi-dimensional vectors that correspond to particular labels. The vectors may be used to construct label mappings for the data set).




Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHIRLEY D. HICKS whose telephone number is (571)272-3304.  The examiner can normally be reached on Mon - Fri 7:30 - 4:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Fred Ehichioya can be reached on (571) 272-4034.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.D.H./Examiner, Art Unit 2168  

/IRETE F EHICHIOYA/Supervisory Patent Examiner, Art Unit 2168