Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
Application 16/790,984 filed 2/14/2020 has been examined.
Claims 1-20 are currently pending. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an
abstract idea without significantly more.
Claim 1 recites:
applying a clustering algorithm to the co-occurrence matrix to determine keyword groups.
The limitation of applying a clustering algorithm to the co-occurrence matrix to determine keyword groups, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. That is, other than reciting a computer/processor, nothing in the claim element precludes the step from practically being performed in the mind. For example, but for the computer/processor language, querying in the context of this claim encompasses the user manually determining keyword groups using generic clustering algorithm. Similarly, the limitation(s) of determining; constructing; and presenting, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components. For example, but for the computer/processor 
Further, these concepts also recite “Certain Methods of Organizing Human Activity”; (such as
commercial or legal interactions (including agreements in the form of contracts; legal
obligations; advertising, marketing or sales activities or behaviors; business relations) where
generating keywords groups based on “clustering” is a method of human activity in advertising/marketing activities.
Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. In particular, the claim only recites one additional element – using a computer/processor to perform both the determining; constructing; and presenting and applying/clustering steps. The computer/processor in both steps is recited at a high level of generality (i.e., as a generic processor performing a generic computer function of generating keywords using “clustering”) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea. 
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using a computer/processor to perform both the determining; constructing; and presenting and applying/clustering steps amounts to no more than mere instructions to apply the exception using a generic computer component. Mere 

Dependent claims 2-7 are merely add further details of the abstract steps/elements recited in claim 1 without integrating the idea into a practical application; or including an improvement to another technology or technical field, an improvement to the functioning of the computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, dependent claims 2-7 are also directed towards nonstatutory subject matter.

As per independent claims 8 and 15, are also rejected as ineligible subject matter under 35
U.S.C. 101 for substantially the same reasons as the method claim(s) 1. The components (i.e.,
system/medium described in independent claims 8 and 15 do not provide for integrating the
abstract idea into a practical application. At best, the claim(s) are merely providing alternate
environments to implement the abstract idea.

Dependent claims 9-14, 16-20 merely add further details of the abstract steps/elements recited in claim 1 without integrating the idea into a practical application; or including an improvement to another technology or technical field, an improvement to the functioning of the computer itself, or meaningful limitations beyond generally linking the use of an abstract idea to a particular technological environment. Therefore, dependent claims 9-14, 16-20 are also directed towards non-statutory subject matter.




Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.



Claims 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Misra et al., US Pub. No. 2013/0268916 A1, in view of Liebald et al., US Pub. No. 2012/0143911 A1.   

As to claim 1 (and substantially similar claim 8 and claim 15), Misra discloses a computer-implemented method comprising:

determining, by a processor, based on a first corpus, a first list of keywords;
(Misra [0039] The class analysis module 104 may further extract a list of words from code comment strings and variable identifiers by splitting the code comment strings into separate words and by applying tokenization on each word. For example, the code comment string "This
ControllerC!ass will schedule processes" may be separated into words {"This", "ControllerC!ass", "will", "schedule", "processes"} and the tokens {"This", "Controller", "Class",
"will", "schedule", "processes"} may be extracted)

constructing a co-occurrence matrix based on the first list of keywords;
(Misra [0046] With regard to vector space model based estimation of textual similarity, the similarity determination module 105 may populate a co-occurrence matrix.; see also [0104] At block 406, textual similarity for business class pairs may be estimated based on the extracted features. Estimating textual similarity may include populating a co-occurrence matrix that accounts for a frequency of occurrence of IF tokens in a business class)

and
presenting the first plurality of keyword groups to a user via a user interface
(Misra [0120] At block 422, clusters may be automatically labeled by extracting dominant terms using class-names, textual vectors, and public method identifiers. For example, referring to
FIG. 1, the output module 110 may provide for automatic labeling of clusters.)

Misra does not explicitly disclose:
applying a clustering algorithm to the co-occurrence matrix to determine a first
plurality of keyword groups; 

However, Liebald discloses:
applying a clustering algorithm to the co-occurrence matrix to determine a first
plurality of keyword groups;
(Liebald [0064] The topic cluster module 310 creates topics cluster TC; including topics occurring in user profiles. The topic cluster module 310 may create topics clusters based on clustering algorithms like hierarchical agglomerative clustering (HAC), probabilistic models like Latent Dirichlet Allocation (LDA), or vector models, such ask-means (using rows in the co-
See also [0064] To create topics clusters from the co-occurrence matrix 500, the topic cluster module 310 identifies the cell in the co-occurrence matrix 500 with the highest co-occurrence strength CS;,t· After identifying the cell with the highest co-occurrence strength CS;,;, the topic
cluster module 310 clusters the topics (t;, 1t) associated with the identified strength CS;,;•)

It would have been obvious to one having ordinary skill in the art at the time the time of the effective filing date to apply clustering as taught by Liebald since it was known in the art that clustering systems provide topic cluster module that repeats the step of identifying the cell
in the new co-occurrence matrix with the highest co-occurrence strength CS;,; and clustering the topics or clusters associated with the identified cell. (Liebald [0067]).
As to claim 2, Misra discloses the method of claim 1, wherein the first corpus comprises application source code (Misra [0023] The system and method may use a multidimensional
view of the input source code for component discovery. Thus, the source code elements may be characterized in terms of a comprehensive set of features related to the source code elements
and their inter dependencies.;
see also [0025] The clusters generated by the clustering may represent components of the source code.;).

As to claim 3, Misra discloses the method of claim 1, further comprising:
filtering the first list of keywords to determine a first reduced keyword list; 
(Misra teaches removing stop words/reserved words see [0039] For the lists generated by extracting the list of words from code comment strings and variable identifiers, reserved words may be removed. For example, JAVA language specific reserved words such as, for example, abstract, Boolean, break etc., may be removed. For the lists generated by extracting the list

and
constructing the co-occurrence matrix based on the first reduced keyword list
(Misra [0046] With regard to vector space model based estimation of textual similarity, the similarity determination module 105 may populate a co-occurrence matrix. For the co-occurrence matrix, let D=<Class1 , Class2 , ... , Class;> be the sequence of classes in the source code, where d is the total number of classes in the source code. Further, let T be the sequence of all unique IR tokens occurring across the classes, where T is the union of all the IR tokens extracted)


As to claim 4, Misra discloses the method of claim 1, further comprising:
determining, based on a second corpus, a second list of keywords;
(Misra teaches multiple documents see [0046] For Equation (1), d may denote the total number of documents (i.e., classes) under consideration, and n may denote
the number of documents (i.e., classes) where the j th IR token
appears.;
see also Misra [0039] )
and
presenting the second plurality of keyword groups to the user via the user interface
(Misra [0120] At block 422, clusters may be automatically labeled by extracting dominant terms using class-names, textual vectors, and public method identifiers. For example, referring to
FIG. 1, the output module 110 may provide for automatic labeling of clusters.).

and under the same rationale above, Liebald teaches

and the second list of keywords to determine a second plurality of keyword groups; 
(Liebald [0064] The topic cluster module 310 creates topics cluster TC; including topics occurring in user profiles. The topic cluster module 310 may create topics clusters based on clustering algorithms like hierarchical agglomerative clustering (HAC), probabilistic models like Latent Dirichlet Allocation (LDA),)

As to claim 5, Liebald discloses under the rationale above, the method of claim 4, wherein performing LDA analysis based on the second corpus and the second list of keywords to determine a second plurality of keyword groups comprises:
defining a number of topics for the LDA analysis, wherein the number of topics is defined based on a number of business rule packages that are defined for the second corpus;
(Liebald [0064] The topic cluster module 310 creates topics cluster TC; including topics occurring in user profiles. The topic cluster module 310 may create topics clusters based on clustering algorithms like hierarchical agglomerative clustering (HAC), probabilistic models like Latent Dirichlet Allocation (LDA), or vector models, such ask-means (using rows in the
co-occurrence matrix as topic vectors). In one embodiment, the topic cluster module 310 clusters topics from the co-occurrence matrix 500 using HAC.)
and
performing the LDA analysis based on the defined number of topics and the
second reduced keyword list
(Liebald [0064]).

and Misra discloses:
determining a respective importance score for each keyword of the second list of
keywords;

[0045] The similarity determination module 105 may calculate class to class similarity scores based on the features extracted by the class analysis module 104.)

ranking the second list of keywords based on the determined importance scores; 
(Misra [0084] The clusters may be ranked in decreasing order of their distances from each of the functional entities. For each functional entity, the clusters having a similarity
more than a predetermined or a user-defined minimum threshold may be selected. Lastly, the functional entity may be visualized and/or reported to component mapping by the output module 110.)

determining a second reduced keyword list based on the ranked second list of
keywords; 
(Misra [0084] The clusters may be ranked in decreasing order of their distances from each of the functional entities. For each functional entity, the clusters having a similarity
more than a predetermined or a user-defined minimum threshold may be selected. Lastly, the functional entity may be visualized and/or reported to component mapping by the output module 110).

As to claim 6, Misra discloses the method of claim 4, wherein the second corpus comprises unstructured enterprise artifacts
(Misra [0037] Generally, for each class within an application, tokens may be extracted from source code comments and identifiers.; see also [0039] With regard to textual feature extraction, the class analysis module 104 may extract intermediate representation (IR) tokens from code comments and identifiers; and [0035] With packages and classes identified in the presentation layer, DA layer, as models, or utilities excluded, in order to identify classes in the business layer, 


As to claim 7, Misra discloses the method of claim 1, further comprising:
receiving input from the user via the user interface; 
(Misra [0084] With regard to mapping of functional entities to components, the output module 110 may obtain user-input related to a general functional model including functional entity descriptions ; see also [0032] Referring to FIGS. 1 and 4, the input module 101 may include the user interface 102 to receive 00 source code to be analyzed and corresponding bytecode. The input module 101 may also receive user inputs for identifying packages and classes for performing data access, presentation layer packages and classes, models, and utilities that may be both technical and application specific.)
and
modifying the first plurality of keyword groups based on the user input
(Misra [0084] With regard to mapping of functional entities to components, the output module 110 may obtain user-input related to a general functional model including functional entity descriptions. The descriptions may be single word names or more elaborate textual descriptions. In order to map functional entities to components, the output module 110 may
convert each functional entity (i.e., name) into a word vector.
;See also [0033] The user interface 102 may include, for example, options for selecting configuration set-up at 120, component identification at 121, component visualization at 122, component refinement at 123 and report generation at 124. For the configuration set-up at 120, a user may enter source code directory information at 125 and byte code directory information
at 126. The user may further select scoping and identification of classes by the class identification module 103 using built-in heuristics at 127 or user-defined heuristics at

technical and application specific.).

Referring to claim 9, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 9.

Referring to claim 10, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 10.

Referring to claim 11, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 11.

Referring to claim 12, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 12.

Referring to claim 13, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 13.

Referring to claim 14, this dependent claim recites similar limitations as claim 7;
therefore, the arguments above regarding claim 7 are also applicable to claim 14.

Referring to claim 16, this dependent claim recites similar limitations as claim 2;
therefore, the arguments above regarding claim 2 are also applicable to claim 16.

Referring to claim 17, this dependent claim recites similar limitations as claim 3;
therefore, the arguments above regarding claim 3 are also applicable to claim 17.

Referring to claim 18, this dependent claim recites similar limitations as claim 4;
therefore, the arguments above regarding claim 4 are also applicable to claim 18.

Referring to claim 19, this dependent claim recites similar limitations as claim 5;
therefore, the arguments above regarding claim 5 are also applicable to claim 19.

Referring to claim 20, this dependent claim recites similar limitations as claim 6;
therefore, the arguments above regarding claim 6 are also applicable to claim 20.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:

Roy et al., US Pub. No. 2021/0064703 A1, teaches there is a need for solutions for more effective and efficient natural language processing systems. This need can be
addressed, for example, by a system configured to obtain a term correlation data object for a plurality of digital documents; determine, based at least in part on the term correlation data object, a term-topic correlation data object for the plurality of digital documents; determine, based at least in part on the term-topic correlation data object, a document topic correlation data object for the plurality of digital documents; determine, based at least in part on the term topic

correlation object; 
Leal et al., US Pub. No. 2018/0300315 teaches systems, devices, and methods automated document analysis and processing using machine learning techniques. In one embodiment, systems and methods are disclosed for automatically classifying documents. In another embodiment, systems and methods are disclosed for identifying new tags for untagged documents. In another embodiment, systems and methods are disclosed for identifying documents related to a target document;  
Sinha et al., US Pub. No. 2018/0293978, teaches performing semantic analysis on a user-generated text string includes training a neural network model with a plurality of known text strings to obtain a first distributed vector representation of the known text strings and a second distributed vector representation of a plurality of words in the known text strings, computing a relevance matrix of the first and second distributed representations based on a cosine distance
between each of the plurality of words and the plurality of known text strings, and performing a latent dirichlet allocation (LDA) operation using the relevance matrix as an input to obtain a distribution of topics associated with the plurality of known text strings;
Shalaby et al., US Pub. No.: US 2017/0004129, teaches mined semantic analysis techniques (MSA) include generating a first subset of concepts, from a NL corpus, that are latently associated with an NL candidate term based on (i) a second subset of concepts from the corpus that are explicitly or implicitly associated with the candidate term and (ii) a set of concept association rules. The concept association rules are mined from a transaction dictionary constructed from the corpus and defining discovered latent associations between corpus concepts. A concept space of the candidate term includes at least portions of both the first and second subset of concepts, and includes indications of relationships between latently-
Surendran et al. Pub. No.: US 2008/0005137, teaches claimed subject matter relates to an unsupervised incremental learning framework, and in particular, to the creation and utilization of an unsupervised incremental learning framework that facilitates object discovery, clustering, characterization and/or grouping. Such an unsupervised incremental learning framework, once created, can thereafter be employed to incrementally estimate a latent variable model through the utilization of spectral and/or probabilistic models in order to incrementally cluster, discover, group and/or characterize tightly knit themes/topics within document sets and/or streams, thus leading to the generation of a set of themes/topics that better correlate with human perceptual
labeling schemes;
Dhingra et al., US Pub. No. Pub. No.: 2021/0232766, teaches system for short text identification can determine a plurality of topics and a representative noun that identifies each of the topics in a data repository. The system can determine a co-occurrence matrix for the training words
stored in the corpus and determine a word vector embedding for each of the training words in the corpus to relate each of the training words in the corpus to other ones of the training
words in the corpus in an n-dimensional vector space. The system can determine word tokens for words in short text in documents in the data repository that is separate and distinct from the corpus and determine sentence vectors for short text based on the word vectors in each short text and determine a plurality of topics in the documents based on clustering of sentence vectors, wherein the plurality of topics indicates topics that are predominant in the documents
in the data repository.


CONTACT INFORMATION
Any inquiry concerning this communication or earlier communications from the examiner should be directed to EVAN S ASPINWALL whose telephone number is (571)270-7723. The examiner can normally be reached Monday-Friday 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/Evan Aspinwall/Primary Examiner, Art Unit 2152                                                                                                                                                                                                        1/5/2022