Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
This Office Action corresponds to application 16/196,920 which was filed on 11/20/2018 and claims priority to foreign application India 201841036826 filed 9/28/2018. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 2/24/2021 has been entered.
 

Response to Amendment
In the reply file 2/24/2021, claims 1, 10, and 19 have been amended. Claims 8, 9, 17, and 18 have been cancelled and no new claims have been added.  Accordingly, claims 1-4, 6, 7, 10-13, 15, 16, and 19 stand pending.

Response to Arguments
Applicant’s arguments filed 2/24/2021 have been fully considered but are moot in view of new grounds of rejection.
The applicant argues that Dorie does note teach a representation of co-occurrence between the query n-gram and a cluster label in a hierarchy.  The examiner respectfully disagrees.  Dorie teaches cluster labels are based off key terms from documents in the cluster and assigning documents to clusters based of comparing key terms in the document with key terms in the documents in the cluster based on statistical n-gram analysis (Dorie, [0014-0015, 0098-0101, 0286]) which is interpreted as the relationship may be a co-occurring relationship between the clusters and n-grams.  Therefore, the examiner is not persuaded.
The applicant also argues that Dorie is silent regarding “wherein determining the co-occurring relationship comprises: determining a limited span of 1-grams before and after the top-ranked input n-gram in the one or more input documents using a text adjacency graph; identifying a number of times the 1-gram appears along with the top-ranked input n-gram to determine a co-occurrence score; and determining relationship between the 1-gram and the top-ranked input n-gram as the co-occurring relationship if the co-occurrence score is above a pre-determined threshold”.  The examiner respectfully disagrees.  It is noted that only one kind of relationship out of the list of relationships are required and therefore the language regarding the determining the co-occurring relationship in this limitation is conditional language and under broadest reasonable interpretation as explained in section 2111.04(II) of the MPEP could be interpreted as a contingent limitation that would not be required to be performed and 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 4, 6, 10, 11, 13, 15, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dorie et al. (US2009/0043797), hereinafter Dorie, in view of Allen et al. (US2016/0026634), hereinafter Allen, and Shreve (US2003/0154071).
 
Regarding Claim 1:
Dorie teaches:

determining, by the information retrieval device, one or more top input n-grams associated with the input corpus, wherein the one or more top input n-grams are used to generate one or more input document clusters (Dorie, [0015, 0098-0101], note creating clusters, note assigning documents to clusters comprises n-gram analysis, note n-gram analysis and creating an n-gram profile, note scoring n-grams, note each cluster has a score based on its n-gram scores); 
determining, by the information retrieval device, one or more top- ranked input n-grams from the one or more top input n-grams, wherein the one or more top-ranked input n-grams are determined based on relatedness of a top input n-gram with each top input n-gram (Dorie, [0015, 0098-0101, 0286], note n-gram analysis and creating an n-gram profile, note scoring n-grams, note n-grams scores may be based off relatedness to other words, e.g. n-grams, and term and document frequency among other n-grams its size); 
clustering, by the information retrieval device, the enriched corpus based on one or more top enriched n-grams associated with the enriched corpus to generate one or more enriched document clusters (Dorie, abstract, figure 1, [0015, 0098-0101], note creating clusters, note assigning documents to clusters comprises n-gram analysis, note n-gram analysis and creating an n-gram profile, note scoring n-grams, note each cluster 
determining, by the information retrieval device, one or more n- gram clusters corresponding to each n-gram of the enriched corpus, wherein each of the one or more n-gram clusters comprises semantically related n-grams corresponding to each n-gram of the enriched corpus (Dorie, abstract, figure 1, [0015, 0098-0101], note creating clusters, note assigning documents to clusters comprises n-gram analysis, note n-gram analysis and creating an n-gram profile, note scoring n-grams, note each cluster has a score based on its n-gram scores.  The additionally cited reference of Allen teaches merging the original corpus with an external corpus and then performing analysis operations on the enriched corpus, when combined with the cited reference of Allen, the cluster and n-gram analysis of Dorie would be additionally performed on the modified, e.g. enriched, corpus as well); 
creating, by the information retrieval device, a taxonomy based on the one or more input document clusters, the one or more 

wherein the taxonomy defines at least one relationship between the one or more input document clusters, the one or more 
determining a limited span of 1-grams before and after the top-ranked input n-gram in the one or more input documents using a text adjacency graph (Dorie, [0281, 0286], note using maps of unigrams to the left and right of every stemmed n-gram is interpreted as determining a limited span of 1-grams before and after the top-ranked 
identifying a number of times, the 1-gram appears along with the top-ranked input n-gram to determine a co-occurrence score (Dorie, [0281, 0286], note using maps of unigrams to the left and right of every stemmed n-gram; note that the independence variable used for scoring is a measure of the entropy of unigrams appearing on the sides of the input n-gram, which means number of times the 1-gram appears next to the input n-gram is identified and used to determine the co-occurrence score; It is also noted that only one kind of relationship out of the list of relationships are required and therefore the language regarding the determining the co-occurring relationship in this limitation is conditional language and under broadest reasonable interpretation as explained in section 2111.04(II) of the MPEP could be interpreted as a contingent limitation that would not be required to be performed and therefore is not required for the claim); 
determining relationship between the 1-gram and the top-ranked input n-gram as the co-occurring relationship 
Dorie doesn’t specifically teach:
identifying, by the information retrieval device, an external corpus based on retrieving one or more external documents by querying one or more external databases with the one or more top-ranked input n-grams; 
merging, by the information retrieval device, the external corpus with the input corpus to generate an enriched corpus; 
determining if the co-occurrence score is above a pre-determined threshold;
Allen is in the same field of endeavor, information retrieval;
Allen teaches:
receiving, by an information retrieval device, an input corpus, wherein the input corpus comprises one or more input documents (Allen, figure 7, [0005], note existing corpus was received at some point in order to be used) 
identifying, by the information retrieval device, an external corpus based on the one or more top-ranked input n-grams (Allen, claim 11, [0005], note identifying a 
merging, by the information retrieval device, the external corpus with the input corpus to generate an enriched corpus (Allen, [0005], note identifying a candidate corpus to add to the existing corpora and merging the two. When combined with the previously cited reference this would be for the n-grams and document clusters taught by Dorie); 
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Allen because this would improve the data processing and quality of the corpus (Allen, [0001, 0020-0021]).
Shreve is in the same field of endeavor, document management;
Shreve teaches:
identifying, by the information retrieval device, an external corpus based on retrieving one or more external documents by querying one or more external databases with the one or more top-ranked input n-grams (Shreve, [0021], note the source corpus is analyzed to identify selected objects within the corpus, e.g. n-grams as taught by Dorie, and using those objects for searching external document repositories to expand the source corpus, which is interpreted as identifying an external corpus based on the objects, e.g. n-grams); 
determining if the co-occurrence score is above a pre-determined threshold (Shreve, [0064], note retaining candidate terms whose content words meet a threshold 
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Shreve because this would improve the speed, efficiency and accuracy of the system (Shreve, [0001]).

Regarding Claim 2:
Dorie, Allen, and Shreve show the method as disclosed above;
Dorie, Allen, and Shreve further teach:
wherein the one or more top input n-grams, the one or more top enriched n-grams and the one or more top-ranked input n-grams comprises at least one of a word, a phrase, or a multiword (Dorie, [0098], note n-grams are a set of terms.  The additionally cited reference of Allen teaches merging the original corpus with an external corpus and then performing analysis operations on the enriched corpus, when combined with the cited reference of Allen, the cluster and n-gram analysis of Dorie 
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Allen because this would improve the data processing and quality of the corpus (Allen, [0001, 0020-0021]).

Regarding Claim 4:
Dorie, Allen, and Shreve show the method as disclosed above;
Dorie, Allen, and Shreve further teach:
wherein the relatedness of a top input n-gram with each top input n-gram comprises a number of top input n-grams related to the top input n-gram (Dorie, [0015, 0094, 0098, 0286], note n-gram analysis, note scoring n-grams, note n-grams scores may be based off relatedness to other words, e.g. n-grams, and term and document frequency among other n-grams its size).

Regarding Claim 6:
Dorie, Allen, and Shreve show the method as disclosed above;
Dorie, Allen, and Shreve further teach:
wherein the one or more enriched document clusters are generated by: determining the one or more top enriched n-grams associated with the enriched corpus (Dorie, [0015, 0098-0101], note creating clusters, note assigning documents to clusters comprises n-gram analysis, note each cluster has a score based on its n-gram scores. 
clustering the enriched corpus based on the one or more top enriched n-grams (Dorie, [0015, 0098-0101], note creating clusters, note assigning documents to clusters comprises n-gram analysis, note each cluster has a score based on its n-gram scores. The additionally cited reference of Allen teaches merging the original corpus with an external corpus and then performing analysis operations on the enriched corpus, when combined with the cited reference of Allen, the cluster and n-gram analysis of Dorie would be additionally performed on the modified, e.g. enriched, corpus as well) (Allen, claim 11, [0005, 0065, 0070, 0085], note identifying a candidate corpus to add to the existing corpora based on quality metrics, attributes, and useful natural language processing features, e.g. n-grams, note determining the top features in the new corpus, note domain clusters. When combined with the previously cited reference this would be for the n-grams taught by Dorie).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Allen 

Claim 10 discloses substantially the same limitations as claim 1 respectively, except claim 10 is directed to a device comprising a processor and a memory communicatively coupled to the processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 1 is directed to a method. Therefore claim 10 is rejected under the same rationale set forth for claim 1.

Claim 11 discloses substantially the same limitations as claim 2 respectively, except claim 11 is directed to a device comprising a processor and a memory communicatively coupled to the processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 2 is directed to a method. Therefore claim 11 is rejected under the same rationale set forth for claim 2.

Claim 13 discloses substantially the same limitations as claim 4 respectively, except claim 13 is directed to a device comprising a processor and a memory communicatively coupled to the processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 4 is directed to a method. Therefore claim 13 is rejected under the same rationale set forth for claim 4.

Claim 15 discloses substantially the same limitations as claim 6 respectively, except claim 15 is directed to a device comprising a processor and a memory 

Claim 19 discloses substantially the same limitations as claim 1 respectively, except claim 19 is directed to non-transitory computer readable medium and processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 1 is directed to a method. Therefore claim 19 is rejected under the same rationale set forth for claim 1.	

Claim Rejections - 35 USC § 103

Claims 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dorie in view of Allen, Shreve, and Ciulla et al. (US9645999), hereinafter Ciulla.

Regarding Claim 3:
Dorie, Allen, and Shreve show the method as disclosed above;
Dorie, Allen, and Shreve further teach:
wherein at least one of the one or more top input n-grams and the one or more top enriched n-grams 
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Allen because this would improve the data processing and quality of the corpus (Allen, [0001, 0020-0021]).
Dorie doesn’t specifically teach:
n-grams is determined based on term frequency-inverse document frequency (tf-idf) scores;
Ciulla is in the same field of endeavor, information retrieval;
Ciulla teaches: 
n-grams is determined based on term frequency-inverse document frequency (tf-idf) scores (Ciulla, column 10 lines 34-51, note using tf-idf to score n-grams).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Ciulla because this would improve the data processing and quality of the n-grams used.

Claim 12 discloses substantially the same limitations as claim 3 respectively, except claim 12 is directed to a device comprising a processor and a memory communicatively coupled to the processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 3 is directed to a method. Therefore claim 12 is rejected under the same rationale set forth for claim 3.

Claim Rejections - 35 USC § 103

Claims 7 and 16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dorie in view of Allen, Shreve and Alba et al. (US2018/0225374), hereinafter Alba.

Regarding Claim 7:
Dorie, Allen, and Shreve show the method as disclosed above;
Dorie, Allen, and Shreve further teach:
wherein the one or more top enriched n-grams associated with the enriched corpus is used to create a domain dictionary (Allen, [0005, 0065, 0070, 0085], note determining the top features in the new corpus, note domain clustering. When combined with the previously cited reference this would be for the n-grams taught by Dorie).
It would have been obvious to one of ordinary skill in the art before the effective date of filing to modify the cited references to incorporate the teachings of Allen because this would improve the data processing and quality of the corpus (Allen, [0001, 0020-0021]).
Dorie doesn’t specifically teach:
	creating a domain dictionary;
Alba is in the same field of endeavor, information retrieval;
Alba teaches:
	create a domain dictionary (Alba, [0069], note generating a dictionary for a corpus, when combined with the previously cited references this would be for the enriched corpus and n-grams taught by Dorie and Allen).


Claim 16 discloses substantially the same limitations as claim 7 respectively, except claim 16 is directed to a device comprising a processor and a memory communicatively coupled to the processor (Dorie, figure 1, [0074, 0076, 0219]), while claim 7 is directed to a method. Therefore claim 16 is rejected under the same rationale set forth for claim 7.
	
	Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Kiciman et al. (US20160171063) teaches a phrase unigram model utilizing a threshold to determine relationships.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN J MORRIS whose telephone number is (571)272-3314.  The examiner can normally be reached on M-F 6:30-2:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/JOHN J MORRIS/Examiner, Art Unit 2152                                                                                                                                                                                                        5/7/2021

/NEVEEN ABEL JALIL/Supervisory Patent Examiner, Art Unit 2152