Subset 



Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
 
DETAILED ACTION

This action is response to the application filed on May 13, 2020.

Claims 1-20 are pending.

Claim Rejections - 35 USC § 102

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
 A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.



     Claim 1, 4-8, 11-14 and 17-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gates et al (U.S. Pub. No.  US 2004/0019601 A1).

With respect to claims 1, 8 and 14, Gates et al teaches 
accessing, by at least one computer processor, a plurality of data records from a database repository (claim 83. A storage medium for storing a program executable in a local system that includes first path information specifying a path for accessing a first object stored in a first storage means, and second path information specifying a path for accessing a second object stored in a second storage means, the program being adapted for controlling access to an object); 
storing, by the at least one computer processor, the plurality of data records from the database repository into at least one of primary or secondary memory associated with the at least one computer processor, along with a cluster number for each data record (claim 83. A storage medium for storing a program executable in a local system that includes first path information specifying a path for accessing a first object stored in a first storage means, and second path information specifying a path for accessing a second object stored in a second storage means, the program being adapted for controlling access to an object), 
wherein all data records having a same cluster number form a cluster, wherein each record has been categorized or designated a cluster number out of a total K number of clusters, by an unsupervised machine-learning algorithm categorizing a plurality of data records in the database ([0004] constructing a taxonomy in a way that makes sense to both humans and a machine categorizer, and then selecting training data to enable a categorizer to distinguish with high accuracy among very large numbers (e.g., 8,000 or even very much more) of categories in such a taxonomy); 
for each of a plurality of classification features, performing cluster-based analysis for a first cluster with respect to a single feature by the at least one computer processor ([0019] Training data are the data used in automated categorization systems to teach the systems how to distinguish one category of document, from another. one might train a categorizer to distinguish between documents about men's health and women's health by collecting a set of documents about each subject, and then applying some sort of feature extraction process, which would try to determine the essence of what makes documents about men's health different from women's health. The features used by most feature extraction processes are words, or less commonly groups of characters or word phrases); 
generating, by the at least one computer processor, based on the cluster-based analysis, a single feature overlap score for each of the plurality of classification features based on an amount of proximity between clusters and an amount of overlap of the first cluster with other clusters in the K number of clusters, for each of the plurality of classification features ([0038] key concept is "overlap." invention describes ways of keeping categories from overlapping, or at least of minimizing overlap. Several examples of overlap may help. For example, two categories may overlap if one is a subcategory of the other (e.g., cookie is a subcategory of dessert). A more subtle form of overlap can occur when categories are not mutually exclusive (we often describe this by saying that they were not "sliced and diced" in the same way). Thus, for example, we might divide a category called databases into subcategories called parallel and single-system, or into subcategories called object-oriented and relational, but not into all 4 subcategories, because a database can be both parallel and relational; i.e., the 4 subcategories are not mutually exclusive and thus are said to overlap. In a more rigorous sense, categories are said to overlap to the extent to which they share common features); 
sorting, by the at least one computer processor, each of the plurality of classification features with respect to the first cluster, by overlap score; grouping, by the at least one computer processor, a predetermined number of features having lowest overlap scores out of all the features sorted, with respect to the first cluster ([0019] Training data are the data used in automated categorization systems to teach the systems how to distinguish one category of document, from another. For example, one might train a categorizer to distinguish between documents about men's health and women's health by collecting a set of documents about each subject, and then applying some sort of feature extraction process); and 
generating, by the at least one computer processor, a naming label for the first cluster based on the predetermined number of features having the lowest overlap scores ([0044] In step 107, pairs of categories with the highest similarity are examined to determine how to reduce the degree of overlap. The goal of this step and the preceding steps is to produce a set of features with a minimum degree of overlap between and among categories and supercategories).

 With respect to claims 4, 11 and 17, Gates et al teaches  predetermined number of features having the lowest overlap scores out of all the features sorted, determining if all records in the first cluster are grouped to be below or above a threshold value, or are near a decision boundary, for the respective feature; and if all the records in the first cluster are determined to be grouped above or below a threshold value for such a respective feature, or are determined to be near a decision boundary, including the threshold value or decision boundary in the naming label for the first cluster ([0055] FIG. 8 shows the details of step 105. The goal of this part of the method is to reduce overlap among categories).

With respect to claims 5, 12 and 15, Gates et al teaches assigning a partial name to each of the predetermined number of features having the lowest overlap score with respect to the first cluster; and combining each of the partial names to make a compound naming label for the first cluster ([0055] FIG. 8 shows the details of step 105. The goal of this part of the method is to reduce overlap among categories).
   
With respect to claims 6 and 19, Gates et al teaches using a look-up table to feature name abbreviations and synonyms to canonical names in order to standardize terms used for feature names in assigning a partial name to each of the predetermined number of features having the lowest overlap score with respect to the first cluster ([0004] constructing a taxonomy in a way that makes sense to both humans and a machine categorizer, and then selecting training data to enable a categorizer to distinguish with high accuracy among very large numbers (e.g., 8,000 or even very much more) of categories in such a taxonomy).

With respect to claims 7, 13 and 20, Gates et al teaches  generating a rating for all words eligible for use in the compound naming label for the first cluster; assessing a length of the compound naming label for the first cluster to determine if it surpasses a predetermined length; and if the compound naming label for the first cluster is determined to surpass the predetermined length, removing one or more words from the compound naming label in order of rating from lowest to highest until the compound naming label does not surpass the predetermined length ([0004] constructing a taxonomy in a way that makes sense to both humans and a machine categorizer).
 

Allowable Subject Matter

Claims 2-3, 9-10 and 15-16 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ISAAC M WOO whose telephone number is (571)272-4043.  The examiner can normally be reached on 9:00 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assist from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ISAAC M WOO/Primary Examiner, Art Unit 2163