Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

With respect the claims 1-20, the claims 1-20 recite a series for corpus management by automatic categorization into functional domains to support faceted query.  Thus the claims are directed to a statutory category, because a series of step for corpus management by automatic categorization into functional domains to support faceted query (a series of acts).  The claims are directed to a judicial exception. Further, the claims recite identifying one or more functional domain categories, ingesting in incoming documents to form to open domain corpus, identifying one or more representative documents to establish a seed sub-corpus, calculating a degree of fit score and assigning ore or more of the incoming document to one or more the functional domain categories based upon the degree of fit score to create an enhanced corpus.  Therefore, the claims fall in one of abstract ideas.  An idea standing alone such an unistantiated concept, plan, or scheme, as well as a mental process (thinking) that “can be performed in human mind, or by a human using pen and paper.  Like the invention in Alice Corp, the instant claim is merely limiting the abstract idea to a computer environment by simply performing the idea via a computer to have corpus management by automatic categorization into functional domains to support faceted query.  This is abstract idea.  Further, the claims does not have any additional limitations recited that amount to significantly more than the abstract idea.  The claims require no additional limitations.  These generic computer components (computer, storage medium, etc.) are claimed to perform their basic functions of corpus management by automatic categorization into functional domains to support faceted query.  This recitation of the computer limitations amounts to mere instructions to implement the abstract idea on a computer.   Taking the additional elements individually and in combination, the computer components at each step of corpus management by automatic categorization into functional domains to support faceted query perform purely generic computer functions.  As such, there is no inventive concept sufficient to transform the claimed subject matter into a patent-eligible application.  The claim does not amount to significantly more than the abstract idea itself.  Accordingly, the claim is not patent eligible.
With respect to claims 15-20, claims 15-20 recite an enhanced corpus management system, however the components of the an enhanced corpus management system are merely software per se.  A system claims much recite physical structure thus enabling it to be properly categorized in one of the statutory categories of invention.  Since the components of the enhanced corpus management system claims 15-20 are software per se and do not contain any physical components, the systems cannot be categorized in one of the statutory categories of invention and is thus nonstatutory.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any 
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1, 7-8 and 14-15 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Allen et al. (U.S. Pub. 2015/0347557 A1).
With respect to claims 1, 8 and 15, Allen et al. discloses a computer implemented method, in a data processing system comprising a processor and a memory comprising instructions which are executed by the processor to cause the processor to implement an enhanced corpus management system, the method comprising:  
identifying one or more functional domain categories (i.e., “provides an approach in which a domain corpus subset generator correlates documents from a document corpus to domain discernible attributes associated with domain corpus subsets. The domain corpus subset generator analyzes correlation results from the correlation and stores the documents into domain corpus subsets accordingly” (0004) and domain discernible attributes is functional domain categories of claimed invention); 
ingesting one or more incoming documents to form an open-domain corpus (i.e., fig. 1 shows domain corpus subset 155 from the ingesting one or more incoming document 135)); 
for each functional domain category, identifying one or more representative documents to establish a seed sub-corpus (i.e., “document correlation engine 130 correlates the document attributes to domain discernible attributes in attributes store 125. In one embodiment, document correlation engine 130 increments a correlation counter during correlation according to correlation values assigned to matching domain discernable attributes”(0032));
calculating a degree of fit score between each of the one or more incoming documents and the one or more established functional domain category seed sub-corpora (i.e., “embodiment, document correlation engine 130 includes a classifier that analyzes the documents and generates a degree of confidence for a domain” (0055)); and
assigning one or more of the incoming documents to one or more of the functional domain categories based upon the degree of fit score to create an enhanced corpus (i.e., “the classifier correlates the font attribute to a "Computer Science" domain, the footer attribute to a "Legal" domain, and the watermark attribute to a "Business" domain and a "Computer Science" domain. In this example, the correlation results in the document having a 2/3 computer science domain correlation, 1/3 Legal domain correlation, and 1/3 business domain correlation and, therefore, document correlation engine 130 assigns the document to the computer science document correlation engine 130 assigns the document to the computer science domain” (0055)). 
With respect to claims 7, 14, Allen et al. discloses further comprising: through a cognitive system, providing one or more answers to one or more questions using the enhanced corpus (i.e., “a question-answer system utilizes documents included in a specific domain corpus subset to provide relevant and accurate answers to an input question” (abstract)). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


Claims 2, 9 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over by Allen et al. (U.S. Pub. 2015/0347557 A1) in view of Beckwith et al. (U.S. Pub. 2015/0205785 A1).
With respect to claim 2, 9 and 16, Allen et al. discloses all limitation recited in claims 1, 8 or 15 except for one or more representative documents in the seed-sub-corpora document are represented in a vector form. However, Beckwith et al. discloses representative documents in the seed-sub-corpora document are represented in a vector form (i.e., “The illustrative lexical space 128 is defined as a multidimensional space that has a number of dimensions (or positions), where the number of dimensions corresponds to the number of lexical items in the user corpora 164, and each of the dimensions represent one of the lexical items existing in the corpora 164. For example, if the corpora 164 includes a document 1, "Dandelions are flowers," authored by user 1 and a document 2 authored by a user 2, "Dandelions are weeds," the vector representation of the corpora 164 may be: [dandelions, are, flowers, weeds]; the vector representation of document 1 may be: [1, 1, 1, 0]; and the vector representation of document 2 may be [1, 1, 0, 1”(0016)).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to modify Allen et al. to have Beckwith et al.’s features in order to classify the accurate the document for the stated purpose has been well known in the art as evidenced by teaching of Beckwith et al.  Both references are same field such as classify the document.


Claims 4-5, 11-12 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over by Allen et al. (U.S. Pub. 2015/0347557 A1) in view of Dash et al. (U.S. Pat. 7,392,250 B1).
With respect to claim 4-5, 11-12 and 18-19, Allen et al. discloses all limitation recited in claims 1, 8 or 15 except for each functional domain category, identifying one or more representative documents to establish a seed sub-corpus through metadata faceting using a hard decision boundary or soft decision boundary. However, Dash et al. discloses each functional domain category, identifying one or more representative documents to establish a seed sub-corpus through metadata faceting using a hard decision boundary or soft decision boundary (i.e., “The difference from relational group-by is that a document d may belong to multiple groups if d has more than one value for a f.sub.i. Specifically, S.sub.q,F=[(v.sub.1, . . . , v.sub.m, c)], where v.sub.i.chi.d.sub.fi, and c is a scalar aggregate computed over documents in a group satisfying all constraints f.sub.i=v.sub.i, i= . . . , m. Within exemplary embodiments of the present invention a particular focus is on a scalar aggregate c, wherein the scalar aggregate c counts the number of documents” (col. 3, lines 10-20) and “presenting a facet with a large number of values is hard for a user to visualize, let alone understand. Therefore, we want to select a subset of facets as candidates for further processing only if the number of facet values is smaller than a threshold .tau. (e.g., less than 100). To achieve this objective we first preprocess each facet hierarchy until the following property holds: The number of children of each node is less than .tau.. If a node d has more than .tau. children then a new facet level is created under d and the children of d are divided into smaller groups. There are many ways of grouping the facet values. For example, for a "price" facet it may be desired to group the values into some fixed number of price ranges” (col. 4, lines 60-67 or col. 5, lines 1-5)).  It would have been obvious for a person of ordinary skill in the art, as of the .
Claims 6, 13 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over by Allen et al. (U.S. Pub. 2015/0347557 A1) in view of Selvaraj et al. (U.S. Pub. 2011/0264651 A1).
With respect to claim 6, 13 and 20, Allen et al. discloses all limitation recited in claim 1 except for measuring cosine similarity between each incoming document and the representative documents of each sub-corpora; and measuring redundancy through maximum inverse cosine similarity between each incoming document and the representative documents of each sub-corpora. However, Rubanovich discloses measuring cosine similarity between each incoming document and the representative documents of each sub-corpora; and measuring redundancy through maximum inverse cosine similarity between each incoming document and the representative documents of each sub-corpora (i.e., “the similarity score is computed using a term frequency -inverse document frequency (tf-idf) cosine similarity computed between an entity name and the foregoing candidate resource attributes. For the anchor text, a similarity score s.sub.i is computed between the title and every anchor text line a.sub.i (from in-link resource I ”(0079) and “there are multiples due to URL canonicalization, redirection or page duplication, they may be redundant. A database of webpage information, such as Yahoo! WebMap, may be queried to resolve redundancy issues”(0087)).  )).  It would have been obvious for a person of ordinary skill in the art, as of the effective filing date of the claimed invention, to modify Allen et al. to Selvaraj et al.’s features in order to classify the accurate the document for the stated purpose has been well known in the 
Allowable Subject Matter
Claims 3, 10 and 17 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims, since the prior art of record and considered pertinent to the applicant’s disclosure does not teach or suggest the claimed  further comprising: performing a human-in-loop sanity check through a dynamic faceting, if one of the functional domain categories is judged to be amiss, adding one or more additional representative documents to the seed-sub corpus corresponding to the one of the function domain categories. 
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HUNG T VY whose telephone number is (571)272-1954.  The examiner can normally be reached on M-F 8-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on (571)272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

/HUNG T VY/Primary Examiner, Art Unit 2163                                                                                                                                                                                                        February 27, 2021