DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Remarks
The amendments were received on 9/22/22.  Claims 1, 2, 4-12, 14-17, 19, and 20 are pending in the application.  Claims 3, 13, and 18 have been cancelled.
Claims 1, 2, 4-12, 14-17, 19, and 20 are rejected under 35 U.S.C. 101.
Claim(s) 1, 2, 4-6, 11, 12, 14-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Murray (US 2011/0225159), and further in view of Medynskiy et al. (US 2008/0104002) and Pradhan et al. (US 2009/0327243).
Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Murray in view of Medynskiy and Pradhan, and further in view of Tacchi (US 9715495).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Murray in view of Medynskiy and Pradhan, and further in view of Gaussier et al. (US 2003/0101187).

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 2, 4-12, 14-17, 19, and 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.  
Step 2A, Prong One asks: Is the claim directed to a law of nature, a natural phenomenon (product of nature) or an abstract idea? See MPEP 2106.04 Part I. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  See MPEP 2106.04(a).

The limitation of preprocessing document text, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, nothing in the claim element precludes the step from practically being performed in the mind. For example, “preprocessing” in the context of this claim encompasses the user mentally parsing, lemmatizing, stemming, and tokenizing the text and removing stop words. Similarly, the “constructing a list of words” limitation, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, “constructing” in the context of this claim encompasses the user mentally, or with pen and paper, creating a list of words. Both “calculating a similarity” limitations and “determining one or more document clusters” limitations, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, “calculating” and “determining” in the context of this claim encompasses the user mentally calculating similarity among documents and grouping them based on the similarities. The “generating a … vector” limitations, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, “generating” in the context of this claim encompasses the user mentally, or with pen and paper, creating a list of words.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.
	
At step 2a, prong two, this judicial exception is not integrated into a practical application.  Claims 12 and 17 recite a processor/computing system to execute the operations, however, this is recited as a high-level of generality (i.e., as a generic processor performing a generic computer function of receiving and extracting such that it amounts to no more than mere instructions to apply the exception using a generic computer component.  Additionally, the claim recites “receiving a plurality of documents” and “extracting one or more topics.”  These elements do not integrate the abstract idea into a practical application because they do not impose a meaningful limit on the judicial exception and provide only insignificant extra solution activity that is mere data gathering in conjunction with the abstract idea.

The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional elements amount to no more than mere instructions to apply an exception using generic computer components.  Mere instructions to apply an exception using generic computer components cannon provide an inventive concept.

	With respect to “receiving a plurality of documents”, the courts have found limitations directed towards data gathering to be well-understood, routine, and conventional.  See MPEP 2106.05(d)(II).  “receiving or transmitting data over a network.”
With respect to “extracting one or more topics”, the courts have found limitations directed towards extracting to be well-understood, routine, and conventional.  See MPEP 2106.05(d)(II).  “electronically scanning or extracting data.”

Considering the additional elements individually and in combination and the claim as a whole, the additional elements do not provide significantly more than the abstract idea.  The claim is not patent eligible.

With respect to claim 2, the limitations are directed towards reporting the extracted one or more topics.  This provides only insignificant extra-solution activity at step 2A, prong two (see MPEP 2106.05(g)(3), “necessary data gathering and outputting.”  Additionally, the courts have found limitations directed towards data gathering to be well-understood, routine, and conventional.  See MPEP 2106.05(d)(II).  “Presenting offers and gathering statistics.”

With respect to claims 4, 5, and 14 the limitations are directed towards calculating a similarity, which has been discussed above and does not provide significantly more than the abstract idea.

With respect to claims 6, 15, and 19, the limitations are directed towards determining one or more document clusters, which has been discussed above.  Each of the limitations in claim 6 could be performed in the human mind and does not provide significantly more than the abstract idea.

With respect to claims 7, the limitations are directed towards performing Latent Dirichlet Allocation (LDA), which has been discussed above.  The Latent Dirichlet Allocation (LDA) provides only a mathematical concept and does not provide significantly more than the abstract idea.

With respect to claims 8, 16, and 20, the limitations are directed towards extracting one or more topics, which has been discussed above.  Each of the limitations in claim 8 could be performed in the human mind and does not provide significantly more than the abstract idea.

With respect to claims 9, 10, the limitations are directed towards the number of topics and clusters and does not provide significantly more than the abstract idea.

With respect to claims 11, the limitations are directed towards identifying one or more outlier documents, which, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, “identifying” in the context of this claim encompasses the user mentally analyzing the documents and does not provide significantly more than the abstract idea.

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 2, 4-6, 11, 12, 14-17, 19, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Murray (US 2011/0225159), and further in view of Medynskiy et al. (US 2008/0104002) and Pradhan et al. (US 2009/0327243).

With respect to claim 1, Murray teaches a computer-implemented method for identifying latent themes in textual data, the method comprising: 
receiving a plurality of documents (Murray, pa 0060, extract text documents from the domain corpus); 
preprocessing document text for each document among the plurality of documents, wherein the preprocessing comprises stemming the document text for each document, tokenizing the document text for each document, and removing stop words from the document text for each document (Murray, pa 0060, kPOOL parses the documents 15 into terms 17, by performing part-of-speech (POS) tagging, adaptive stop word removal, porter stemming, and term reduction, to the terms in the documents); and
constructing a list of words from the preprocessed document text for each document (Murray, pa 0068, These parsing 67 processes transform a document 15 composed of a series of words into a group of key terms 17 that express the concept underlying each document);
calculating a similarity of each pair of documents among the plurality of preprocessed documents based on the list of words from the preprocessed document text (Murray, pa [0098] kPOOL continues step 119 shown in FIG. 9, including the process of measuring cosine similarity 125, matching pairs 157, and removing pairs 159 from the document pool 123 until all documents within the limit of the "law of the vital few" are paired.); 
determining one or more document clusters among the plurality of preprocessed documents based on the calculated similarity of each pair of documents among the plurality of preprocessed documents (Murray, pa 0100, The pairing process will result in clusters of documents comprising no less than two documents); and 
extracting one or more topics in each document cluster among the determined one or more document clusters (Murray, 0057, kPOOL may correlate the relevant documents 31 to the hierarchy of clustered documents 11, to retrieve information regarding the topics the relevant documents fall into.);
calculating a similarity between each of the document clusters and each document among the plurality of documents (Murray, pa [0101] FIG. 17 represents the process of redistributing nodes to clusters using a steepest descent similarity algorithm, represented as step 122 in FIG. 9. The redistribution of nodes to clusters is referred to as a process of optimal agglomerative clustering. The process of redistributing nodes includes a step of initially allocating documents to clusters 150, as discussed above, and in steps 113,115, 117, 119, and 121 of FIG. 9. Once all documents are allocated to clusters, a gradient for each cluster and document is computed 152.).
	Murray doesn't expressly discuss lemmatizing the document text for each document, generating a topic vector comprising each of the one or more topics, and generating a numerical vector for each document based on the topic vector.
Medynskiy teaches receiving a plurality of documents (Medynskiy, pa 0052, In step S200, a corpus of texts is determined.); 
preprocessing document text for each document among the plurality of documents, wherein the preprocessing comprises lemmatizing the document text for each document, stemming the document text for each document (Medynskiy, pa 0053, Word stemming and lemmatization of the texts is performed in step S300.), tokenizing the document text for each document (Medynskiy, pa 0054, dominant topics are the words and/or lemmas with the highest frequency count across the corpus of texts.);
determining one or more document clusters among the plurality of preprocessed documents based on the calculated similarity of each pair of documents among the plurality of preprocessed documents (Medynskiy, [0055] Clusters of texts are determined based on the dominant topics in step S500);
generating a topic vector comprising each of the one or more topics (Medynskiy, pa 0065, N-dimensional topic vectors characterizing the clusters of content portions are mapped to the 2-dimensional display space using a transformation that pre serves pair-wise relationships.), and generating a numerical vector for each document based on the topic vector (Medynskiy, pa 0055, representative vectors are determined … for each text.).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Murray with the teachings of Medynskiy because it determines a set of feature values of the text to determine similar clusters (Medynskiy, pa 0005).

Murray in view of Medynskiy doesn't expressly discuss generating a numerical vector for each document based on the topic vector, each element of the numeric vector being a count of instances of each topic in the document; and calculating a similarity between each of the document clusters and each document among the plurality of documents based on the numerical vector for each document.
Pradhan teaches generating a numerical vector for each document based on the topic vector, each element of the numeric vector being a count of instances of each topic in the document (Pradhan, pa 0099, Determining the term vectors may include building a list and a count of the sets of taxonomic nouns and applying a dictionary of taxonomic nouns. Similarly, t3he list of taxonomic nouns may include a document and occurrence count. That is, a count as to the number of times the taxonomic noun appears in a particular document and the number of documents in which the taxonomic noun appears.); and 
calculating a similarity between each of the document clusters and each document among the plurality of documents based on the numerical vector for each document (Pradhan, pa 0099, the system and method of the present invention aggregates the sets of taxonomic nouns to determine term vectors that represent the document and categorizes the document based upon at least one of the term vectors, the taxonomic nouns, and the author-generated classification scheme).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Murray in view of Medynskiy with the teachings of Pradhan because the count provides a characterization of the document (Pradhan, pa 0069).

With respect to claim 2, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 1, further comprising: reporting the extracted one or more topics (Murray, pa 0057, A fisheye view 33 displays the topics that correlate to the relevant documents 31.).

With respect to claim 4, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 1, wherein calculating a similarity of each pair of preprocessed documents comprises: counting instances of each word of the list of words appearing in the preprocessed document text for each document; and calculating the similarity of each pair of documents based on the counted instances (Murray, pa 0087, kPOOL measures the vector, or "node" similarities of documents in the reformed term-to-document matrix 23, shown in FIG. 8 and removes non-outliers.).

With respect to claim 5, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 4, wherein the similarity of each pair of documents is calculated as a cosine similarity (Murray, pa 0089, kPOOL measures the cosine similarity between each document 15 in the document pool 123.).

With respect to claim 6, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 1, wherein determining one or more document clusters comprises iteratively performing until a desired number of document clusters is determined (Murray, pa 0010, kPOOL continues to cluster the nodes iteratively until a structure is formed resembling a tree): 
searching the calculated similarities of each pair of documents for a maximum similarity between pairs of documents (Murray, pa 0087, similar nodes are paired); 
averaging the similarities of the pair of documents having the maximum similarity into a new document cluster (Murray, pa 0100, The pairing process will result in clusters of documents comprising no less than two documents); 
remove the individual documents of the pair of documents having the maximum similarity from the plurality of documents (Murray, pa  0097, the matching pair 157 is removed 159 from the document pool 123. Removing 159 the matching pair 157 from the document pool 123 assures that neither document that comprises the pair, will be matched with another pair at this point in the clustering process.); 
adding the new document cluster to the plurality of documents; and calculating a similarity of the new cluster and each document among the plurality of preprocessed documents (Murray, pa 0102, . Thus, the gradient equation (6) listed above, allows kPOOL to determine the effect of each document within the cluster on the similarity measure of the entire cluster. The gradient equation (6) is applied to each cluster, and each document within the cluster.)

With respect to claim 11, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 1, further comprising: identifying one or more outlier documents among the plurality of documents and one or more topics in the one or more outlier documents (Murray, pa 0091, kPOOL filters the results to only use the outliers of the similarity measure).

With respect to claims 12 and 14-16, the limitations are essentially the same as those in claims 1, 4, and 6, and thus are rejected for the same reasons.

With respect to claims 17, 19, and 20, the limitations are essentially the same as those in claims 1, 4, and 6, and thus are rejected for the same reasons.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 7-9 are rejected under 35 U.S.C. 103 as being unpatentable over Murray in view of Medynskiy and Pradhan, and further in view of Tacchi (US 9715495).

With respect to claim 7, Murray in view of Medynskiy and Pradhan in view of Tacchi teaches the computer-implemented method of claim 1, as discussed above.  Murray in view of Medynskiy and Pradhan doesn't expressly discuss wherein extracting one or more topics in each document cluster comprises performing Latent Dirichlet Allocation (LDA) within each cluster.
Tacchi teaches wherein extracting one or more topics in each document cluster comprises performing Latent Dirichlet Allocation (LDA) within each cluster (Tacchi, Col. 7 Li. 11-13, some embodiments may execute a form of Latent Dirichlet Allocation. In some cases, a number of topics to be ascertained may be supplied, e.g., by a user indicating that 2, 3, 5, or 50 topics are to be ascertained).
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Murray in view of Medynskiy and Pradhan with the teachings of Tacchi because it provides a user-friendly mechanism to adjust the relationship graph such that certain topics are not over or under emphasized (Tacchi, Col. 2 Li. 46-62).

With respect to claim 8, Murray in view of Medynskiy and Pradhan in view of Tacchi teaches the computer-implemented method of claim 7, wherein extracting one or more topics in each document cluster further comprises: counting instances of each topic word identified by the LDA appearing in the preprocessed document text for each document and each cluster; and calculating the similarity of each document and each cluster based on the counted instances of each topic word (Tacchi, Col. 7 Li. 11-13, some embodiments may execute a form of Latent Dirichlet Allocation. In some cases, a number of topics to be ascertained may be supplied, e.g., by a user indicating that 2, 3, 5, or 50 topics are to be ascertained).

With respect to claim 9, Murray in view of Medynskiy and Pradhan in view of Tacchi teaches the computer-implemented method of claim 1, as discussed above.  Murray in view of Medynskiy and Pradhan doesn't expressly discuss wherein a number of extracted topics in each document cluster is a user-specified maximum number of topics.
Tacchi teaches wherein a number of extracted topics in each document cluster is a user-specified maximum number of topics (Tacchi, Col. 7 Li. 11-13, some embodiments may execute a form of Latent Dirichlet Allocation. In some cases, a number of topics to be ascertained may be supplied, e.g., by a user indicating that 2, 3, 5, or 50 topics are to be ascertained).
	It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Murray in view of Medynskiy and Pradhan with the teachings of Tacchi because it provides a user-friendly mechanism to adjust the relationship graph such that certain topics are not over or under emphasized (Tacchi, Col. 2 Li. 46-62).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Murray in view of Medynskiy and Pradhan, and further in view of Gaussier et al. (US 2003/0101187).

With respect to claim 10, Murray in view of Medynskiy and Pradhan teaches the computer-implemented method of claim 1, as discussed above.  Murray in view of Medynskiy and Pradhan doesn't expressly discuss wherein a number of determined clusters is a user-specified maximum number of clusters (Gaussier, pa 0029, the parameter .beta. may be a value that controls the complexity of an objective function to optimize through the number of clusters and the computation of the parameter value itself. The value of .beta. … may also be provided by a user through an input/output device such as keyboard 110.)
It would have been obvious at the effective filing date of the invention to a person having ordinary skill in the art to which said subject matter pertains to have modified Murray in view of Medynskiy and Pradhan with the teachings of Gaussier because it controls the complexity of an objective function to optimize for clustering (Gaussier, pa 0029).

Response to Amendment
35 U.S.C. 112 rejections
With regard to claims 1, 2, 4-12, 14-17, 19, and 20, the amendments to the claims have overcome the 35 U.S.C. 112 rejection.  The Examiner withdraws the 35 U.S.C. 112 rejection to claims 1, 2, 4-12, 14-17, 19, and 20.  

35 U.S.C. 103 rejections
Applicant seems to argue a newly amended limitation.  Applicant’s amendment has rendered the previous rejection moot.  Upon further consideration of the amendment, a new grounds of rejection is made in view of Pradhan et al. (US 2009/0327243).

Response to Arguments
35 U.S.C. 101 rejections
Applicant argues that the amendments provide elements that are “unconventional or otherwise more than what is well-understood, routine, conventional activity in the field,” and therefore provide an “inventive concept” rendering the claims eligible.  The Examiner respectfully disagrees.  Applicant has not discussed how the amendments provide more than what can be performed in the mind.  The limitations defining the elements of the numeric vector and the limitations directed towards further defining the calculating do not provide anything more than what can be done in the mind.  If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRITTANY N ALLEN whose telephone number is (571)270-3566. The examiner can normally be reached M-F 9 am - 5:00 pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Usmaan Saeed can be reached on 571-272-4046. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/BRITTANY N ALLEN/           Primary Examiner, Art Unit 2169