DETAILED ACTION

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . Claims 1-18 have been examined and are pending.


Double Patenting

The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-18 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-18 of U.S. Patent No. US 10,579,663. Although the claims at issue are not identical, they are not patentably distinct from each other because the pending application became identical to the US 10,579,663 if claims 6-8 of “selecting, based on a selection criterion, a subset of the set of subgroups” are read into the claim 1.


Claim Rejections - 35 USC§ 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be 

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 19, 20, 2, 5, 6-7, 15-16 and 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over SUAREZ et al. (20080082531 hereinafter as SUAREZ) in view of FAYYAD et al. (6374251, hereinafter as FAYYAD) and further in view of SHYR et al. (20150286704, hereinafter as SHYR) and BRUBAKER et al. (20090299857, hereinafter as BRUBAKER).

As per claim 1, A computer-implemented method for data insight discovery using a clustering technique, the method comprising: 

Surez teaches a computer-implemented method for data insight discovery using a clustering technique:
(Suarez [0016] An example embodiment can provide the advantages of a Hierarchical Clustering algorithm (i.e. no requirement for setting the number of clusters a priori), with the speed and scalability of other clustering algorithms, where the number of clusters has to be specified at the start. An example embodiment can provide a fast method of clustering wherein each document effectively is deemed to belong to only one cluster and where a mapping algorithm is used to minimize the number of required document comparisons.)

compressing, to assemble a set of sub-clusters, 

Suarez teachings a method of comparing the cluster vectors and decide whether to merge the clusters (e.g., “compressing”) or add them as a subcluster (e.g., “assemble a set of sub-clusters”) of cluster. 
(Suarez [0061] A further, optional, step 112 represented in FIG. 2 is to attempt to group clusters together. Grouping clusters together requires iterating through the K clusters, comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+l and deciding whether to merge the clusters or add Ci+l as a subcluster of cluster Ci.)

	With respect to claim 1, Suarez does not explicitly teach a set of data based on a set of proximity values with respect to a set of predictors as claimed:

a set of data based on a set of proximity values with respect to a set of predictors and a set of targets, wherein the set of predictors is mapped to a set of points in a predictor space, and wherein the set of targets is mapped to a set of points in a target space, and wherein the predictor space and the target space are independent models that each comprise a graphical representation that plots a projection of the set of predictors or the set of targets, respectively, in a three-dimensional space; 

 
    PNG
    media_image1.png
    613
    604
    media_image1.png
    Greyscale


However, Suarez in view of Fayyad discloses a method of density estimation problem of clustering technique. Suppose there are unobserved variable (e.g., “a set of predictors and a set of targets”) indicating the "cluster membership" of the given data item and the data is assumed to arrive from a mixture model and the mixing labels (cluster identifiers) are hidden then a mixture model M having K clusters Ci, i=1, ... , K, assigns a probability to a data point x: variables of two or more data points

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ with the teachings of FAYYAD. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have set of variable data points or variables, as taught by FAYYAD, to have the majority of the data points that are summarized into a condensed representation that represents their sufficient statistics; therefore, enabling analyzing a mixture of sufficient statistics and actual data points, significantly better clustering results than random sampling methods are achieved and with similar lower memory requirements (FAYYAD's col. 3, lines 20-30).

SUAREZ does not explicitly teach establishing, by merging a plurality of individual sub-clusters of the set of sub-clusters, a set of subgroups using a tightness factor as claimed:

establishing, by merging a plurality of individual sub-clusters of the set of sub-clusters, a set of subgroups using a tightness factor; 
     
    PNG
    media_image2.png
    569
    700
    media_image2.png
    Greyscale


However, Suarez in view of SHYR teaches that the interpretation of tightness of a cluster is that the smaller the value of tightness, the less variation of the data cases within the cluster and depending on the specified distance measure, either log-likelihood distance or Euclidean distance may be used. 
   
Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD with the teachings of SHYR. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have tightness and threshold, as taught by SHYR, to have a reducer program merges candidate outliers into CF Trees; and performs CF-tree based hierarchical clustering on the leaf entries, and generates a number of sub-clusters specified by the threshold.

SUAREZ does not explicitly teach compiling, for the subset of the set of subgroups, a set of insight data which indicates a profile of the subset of the set of subgroups with respect to the set of data as claimed:

and compiling, for a subset of the set of subgroups, a set of insight data which indicates a profile of the subset of the set of subgroups with respect to the set of data.  

However, BRUBAKER teaches comprising: compiling a unique profile of a first user based on analysis of data provided by the first user; associating the profile with a first object:
(Brubaker [0152] Another aspect of the invention is a computer implemented method, comprising: compiling a unique profile of a first user based on analysis of data provided by the first user; associating the profile with a first object; transmitting the profile or a subset of the profile to a second object; and presenting content to the first user; wherein the first object is associated with the first user; wherein the content originates from the second object; and wherein at least a portion of the content is based on the profile or subset of the profile trans mitted to the second object.)

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD and SHYR with the teachings of BRUBAKER. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have compiling the profile, as taught by BRUBAKER, to have the transmitting the profile or a subset of the profile to a second object; and presenting content to the first user; wherein the first object is associated with the first user.

As per claim 2, The method of claim 1, further comprising: constructing, to compress the set of data, a set of Cluster Feature (CF) trees using a sequential clustering technique to scan the set of data on a record-by-record basis.  
SUAREZ does not explicitly teach a set of Cluster Feature (CF) trees using a sequential clustering technique to scan the set of data on a record-by-record basis as claimed.
However, Suarez in view of SHYR teaches summarizing local data records through cluster feature (CF)-tree(s) can reduce computational expense:
(Shyr [0012] Some embodiments of the present invention recognize one or more of the following: (i) a method that includes maximum likelihood or scatter separability criteria addresses the problem of determining clustering solutions for large datasets while minimizing computational expense; (ii) that an efficient wrapper approach of variable selection for clustering will reduce computational expense; (iii) that performing distributed data analysis in a parallel framework can distribute computational expense; (iv) that summarizing local data records through cluster feature (CF)-tree(s) can reduce computational expense; (v) that an assessment of variables by building approximate clustering solutions iteratively on the CF-trees can reduce computational expense; (vi) that computational expense can be minimized by utilizing only one data pass over the given data, which allows for an efficient building of clustering solutions on large and distributed data sources;)

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD with the teachings of SHYR. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have cluster feature tree, as taught by SHYR, to have a reducer program merges candidate outliers into CF Trees; and performs CFtree based hierarchical clustering on the leaf entries, and generates a number of subclusters specified by the threshold.

Claims 3 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over SUAREZ in view of FAYYAD, further in view of SHYR, BRUBAKER and Bastos dos Santos et al.(20110271177, hereinafter as SANTOS).

As per claim 3, The method of claim 1, further comprising: establishing, by merging the plurality of individual sub-clusters of the set of sub-clusters, the set of subgroups using a homogeneity factor with respect to the set of predictors.  

As per claim 4, The method of claim 3, further comprising: evaluating, to establish the set of subgroups using the homogeneity factor with respect to the set of predictors, the plurality of individual sub-clusters of the set of sub-clusters based on an attribute comparison with one or more other individual sub-clusters of the set of sub-clusters.  

With respect to claims 3-4, SUAREZ and combined disclose substantially the invention as claimed but SUAREZ and combined do not explicitly teach using a homogeneity factor with respect to the set of predictors as claimed. 

However, SANTOS teaches a method of using thresholding module to determine confidence factor for each final subset of rows where the confidence factor is a measure of homogeneity of the final subset of rows:
(SANTOS [0283] The division module 306 determines a confidence factor for each final subset of rows based on the text rows that are elements of the final subset of rows. The confidence factor is a measure of the homogeneity of the final subset of rows, i.e. how similar the physical structure of each text row in the final subset of rows is to the physical structure of each other text row in the final subset of rows. The confidence factor considers one or more factors representing how similar one text row is to other rows in the document. For example, the confidence factor may consider one or more of a rows frequency, variance, mean of elements, number of elements in the optimum set, and/or other variables for factors. 
SANTOS [0309] The thresholding module 402 determines a confidence factor for each final subset of rows. The confidence factor is a measure of homogeneity of the final subset of rows. In one embodiment, if a column for a selected final subset of rows occurs in only one text row, and therefore has only a single instance, the confidence factor for that text row is zero.)

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ, FAYYAD, SHYR and BRUBAKER with the teachings of SANTOS. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have a measure of homogeneity, as taught by SANTOS, to determine a confidence factor based on a confidence factor ratio including a normalized frequency and the average number of matches between the text rows in the final subset of rows and their master row in the numerator and the mean of the distances between the text rows in the final subset of rows and their master row in the denominator.

As per claim 5, The method of claim 1, further comprising:  comparing a first tightness factor for a first individual sub-cluster with a tightness threshold; comparing a second tightness factor for a second individual sub-cluster with the tightness threshold; ascertaining achievement of the tightness threshold by both the first and second tightness factors for the first and second individual sub-clusters, wherein the plurality of individual sub- clusters includes the first and second individual sub-clusters; and merging the plurality of individual sub-clusters of the set of sub-clusters.  

SUAREZ does not explicitly teach comparing a first tightness factor for a first individual sub-cluster with a tightness threshold and ascertaining achievement of the tightness threshold.
However, SHYR teaches a method build CF-Tree, leaf node is a
sub-cluster (e.g., “merging the plurality of individual sub-clusters of the set of sub-clusters”) which absorbs the data cases that are close together, as measured by the tightness index and controlled by a specific threshold value (e.g., “ascertaining achievement of the tightness threshold”): 

    PNG
    media_image3.png
    284
    697
    media_image3.png
    Greyscale

              
     
    PNG
    media_image4.png
    200
    699
    media_image4.png
    Greyscale

          
    PNG
    media_image5.png
    314
    717
    media_image5.png
    Greyscale


Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD with the teachings of SHYR. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have cluster feature tree; comparing and tightness threshold, as taught by SHYR, to have a reducer program merges candidate outliers into CF Trees; and performs CF-tree based hierarchical clustering on the leaf entries, and generates a number of sub-clusters specified by the threshold (SHYR's [0066]).

As per claim 6, The method of claim 1, further comprising: configuring a selection criterion for selecting the subset of the set of subgroups to include a subgroup size factor.  

As per claim 7, The method of claim 1, further comprising: configuring a selection criterion for selecting the subset of the set of subgroups to include a subgroup tightness factor.  

With respect to claim 6-7 SUAREZ does not explicitly discloses a method of selecting the subset of the set of subgroups to include a subgroup

However, SHYR discloses a method of selecting a set of variables from the plurality of candidate sets (e.g., “a selection criterion for selecting the subset”) of variables that produces an overall clustering solution (e.g., “the set of subgroups”)
(Shyr [0004] One or more processors select a plurality of candidate sets of variables with maximal goodness that are locally optimal for respective subsets based, at least in part, on the approximate clustering solutions. One or more processors select a set of variables from the plurality of candidate sets of variables that produces an overall clustering solution)

Furthermore, Suarez does not explicitly discloses configuring a selection criterion to include a subgroup size/tightness factor 
However, SHYR teaches splitting the leaf node by choosing the farthest pair of entries as seeds, and redistribute the remaining entries based on the closest criteria and the CF-tree size exceeds the size of the main memory, or the CF-tree height is larger than H, the CF-tree is rebuilt to a smaller one by increasing the tightness threshold ([0119], [0127] and [0131]);

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD with the teachings of SHYR. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have criteria and size of cluster and tightness threshold, as taught by SHYR, to have a reducer program merges candidate outliers into CF Trees; and performs CF-tree based hierarchical clustering on the leaf entries, and generates a number of sub-clusters specified by the threshold (SHYR's [0066]).

Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over SUAREZ in view of FAYYAD, further in view of SHYR, BRUBAKER and DESAI et al. (20120185930, hereinafter as DESAI).

As per claim 8, The method of claim 1, further comprising: configuring a selection criterion for selecting the subset of the set of subgroups to include a subgroup isolation index.  

SUAREZ and combined do not explicitly teach configuring the selection criterion to include a subgroup isolation index as claimed.
However, DESAI teaches that a set of domain isolation rules can be indexed by object identifiers: a kernel process locates a domain isolation rule using the object identifier as an index (see [0020], [0028] and [0032]).

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ, FAYYAD, SHYR and BRUBAKER with the teachings of DESAI. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have isolation as an index, as taught by DESAI, to have set of domain isolation rules that can be indexed by object type and then by object identifier (DESAI' [0032]).

Claims 9-14 are rejected under 35 U.S.C. 103 as being unpatentable over SUAREZ et al. in view of FAYYAD et al. and further in view of SHYR et al. and BRUBAKER et al. and WONG et al. (20160070854, hereinafter as WONG).

As per claim 9, The method of claim 1, further comprising: finding, to select the subset of the set of subgroups, one or more data clusters without a reliance on decision rules that define subspaces; and determining, based on a weightiness factor which indicates a relative threshold difference of the subset of the set of subgroups with respect to the set of data, the set of insight data.  

With respect to claims 9-14, SUAREZ, FAYYAD, SHYR and BRUBAKER do not explicitly teach finding, to select the subset of the set of subgroups, one or more data clusters without a reliance on decision rules that define subspaces; and determining, based on a weightiness factor which indicates a relative threshold difference of the subset of the set of subgroups with respect to the set of data, the set of insight data; detecting the set of sub-clusters; and identifying a set of sub-cluster pairs of the set of sub-clusters; analyzing the set of sub-cluster pairs of the set of sub-clusters; and choosing, based on a proximity factor in an input space, a chosen pair of the set of subcluster pairs of the set of sub-clusters

However, WONG teaches generating a knowledge-rich representation of the sequence patterns as Aligned Pattern Digraphs, Class Profiles, Co-Occurrence AP Clusters, Relational Cluster Pairs, Stable Sub-Cluster Configuration within AP Clusters, AP Cluster Relational Graphs and AP Cluster Co-Occurrence Graph (AP Cluster CGraph) and selecting the dominating AP cluster or subcluster pairs, the pairs C1 and C2 can be sorted by ranking them based on the proportion of the number of IC1 .andgate.C2I over that of IC1 .orgate.C2I ([0019], [0092], [0095]-[0096], [0103] and [0113]); enabling the use of amino acid variations to classify the protein ancestries based on its orthologous family classes and its functions based on its paralogous gene classes, whereas, the amino acid conservations to characterize the aligned pattern cluster subspace (or functional region) and a complete weighted graph using the distance between patterns as the edge weight. It then generates a maximum spanning tree from the complete. By cutting the edge one by one (beginning with the shortest distance). an increasing series of cluster configurations can be obtained. For the set of edges with the same weight, it cuts each of them in turn and obtains a different configuration for each cut and stores the separability measure for each configuration (see [0096]-[0098] and [0156])

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ, FAYYAD, SHYR and BRUBAKER with the teachings of WONG. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have weight of each edge that is the distance between the patterns and/or normalized average distances between AP Clusters represented by its incident vertices (WONG' [0094]).

As per claim 10, The method of claim 1, further comprising: detecting the set of sub-clusters; and identifying a set of sub-cluster pairs of the set of sub-clusters.  

As per claim 11, The method of claim 10, further comprising: analyzing the set of sub-cluster pairs of the set of sub-clusters; and choosing, based on a proximity factor in an input space, a chosen pair of the set of sub- cluster pairs of the set of sub-clusters.  

As per claim 12, The method of claim 11, further comprising: evaluating, using a target space tightness factor, a merge operation with respect to the chosen pair of the set of sub-cluster pairs of the set of sub-clusters.  

As per claim 13, The method of claim 11, further comprising: evaluating, using an input space homogeneity factor, a merge operation with respect to the chosen pair of the set of sub-cluster pairs of the set of sub-clusters.  

As per claim 14, The method of claim 11, further comprising: evaluating, using both a target space tightness factor and an input space homogeneity factor, a merge operation with respect to the chosen pair of the set of sub-cluster pairs of the set of sub-clusters; and merging, based on evaluating the merge operation, the chosen pair of the set of sub- cluster pairs of the set of sub-clusters.  

As per claim 15, The method of claim 1, further comprising: configuring the clustering technique to include a hierarchical-oriented clustering technique that uses the tightness factor to indicate an appropriateness of a merge operation.  

With respect to claim 15-16, SUAREZ does not explicitly teach configuring the clustering technique to include a hierarchical-oriented clustering technique that uses the tightness factor to indicate an appropriateness of a merge operation and determining the tightness factor based on a Euclidean distance as claimed.
However, SHYR teaches a control program (as a controller) that collects candidate sets of variables produced by each of the reducers, combines the collected sets of variables, and selects a stable and robust set of variables that produces an overall quality clustering solution based on each variable's respective frequency of appearance among candidate sets of variables produced by reducers; using a sequential clustering approach to scan the records one by one and determine if the current record should be merged with the previously formed sub-clusters or if a new sub-cluster should be created based on a distance criterion; and the tightness [tilde over (.eta.)].sub.j of a cluster C.sub.j can be defined as the average Euclidean distance from member cases to the center/centroid of the cluster ([0029], [0045], [0066] and [0106]); splitting the leaf node by choosing the farthest pair of entries as seeds, and redistribute the remaining entries based on the closest criteria and the CF-tree size exceeds the size of the main memory, or the CF-tree height is larger than H, the CF-tree is rebuilt to a smaller one by increasing the tightness threshold ([0119], [0127] and [0131 ]);

Therefore, it would have been obvious to one of ordinary skill in the art at the time the invention was made to combine/modify the teachings of SUAREZ in view of FAYYAD with the teachings of SHYR. One having ordinary skill in the art would have found it motivated to modify of comparing the cluster vector of cluster Ci with the cluster vector of cluster Ci+ 1 and deciding whether to merge the clusters or add Ci+ 1 as a subcluster of cluster Ci. and optionally running iteratively or recursively until a predetermined number of top level clusters or until the similarity between clusters falls below certain threshold of SUAREZ to have merge operation, Euclidean distance, criteria and size of cluster and tightness threshold, as taught by SHYR, to have a reducer program merges candidate outliers into CF Trees; and performs CF-tree based hierarchical clustering on the leaf entries, and generates a number of sub-clusters specified by the threshold (SHYR's [0066]).

As per claim 16, The method of claim 15, further comprising: determining the tightness factor based on a Euclidean distance.

As per claim 17, The method of claim 1, further comprising:  executing, in a dynamic fashion to streamline data insight discovery using the clustering technique, each of: the compressing, the establishing, the selecting, and the compiling.  

With respect to claims 17-18 SUAREZ teaches executing, in a dynamic fashion to streamline data insight discovery using the clustering technique, each of: the compressing, the establishing, the selecting, and the compiling ([0004]: "clustering" is the grouping of documents into related and similar buckets ("clusters") for ease of processing and reviewing. Unsupervised clustering (i.e., clustering without user control or intervention) is becoming a required feature when processing search results, for example in the legal compliance and discovery environment).

As per claim 18, The method of claim 1, further comprising: executing, in an automated fashion without user intervention, each of: the compressing, the establishing, the selecting, and the compiling.  


Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHONGSUH PARK whose telephone number is (408) 918-7574.  The examiner can normally be reached on Monday - Friday 8:00-5:30 PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on (571)272-3978 EST.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/CHONGSUH PARK/Examiner, Art Unit 2154                     

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154