Application/Control Number: 16/900,917	Page 2
Art Unit: 3624

DETAILED ACTION
This communication is a Final Office Action rejection on the merits. Claims 1-3, 6, 8-9, 11, 13-14, 16, 18, 34, 37, and 39-45 are currently pending and have been addressed below.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments

Applicant's arguments filed 04/29/2022 (related to the 103 Rejection) have been fully considered but are moot in view of new grounds of rejection. Applicant's amendments necessitated the new ground(s) of rejection presented in this Office action. Rejection based on a newly cited reference(s) follows.

Patent Subject Matter Eligibility
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-3, 6, 8-9, 11, 13-14, 16, 18, 34, 37, and 39-45 are not rejected under 35 U.S.C. 101 because the claimed invention includes an additional element that integrates a judicial exception into a practical application. 

Claims 1, 18, and 34 are eligible. The additional element of a neural network is used to train an assessment model based on the subgroup of relevant reference projects. As mentioned in paragraph 0037, the assessment model is trained using one or more types of metadata associated with the relevant reference projects and at least one metric (e.g. a target). Further, testing data is used to evaluate accuracy of the assessment model. Therefore, the additional element of neural network integrates the abstract idea into a practical application because the neural network using testing data to iteratively evaluate accuracy of the model applies the judicial exception in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception, as discussed in MPEP 2106.05(e) and the Vanda Memo issued in June 2018.”









Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-3, 6, 8-9, 11, 13-14, 16, 18, 34, 37, and 39-45 are rejected under 35 U.S.C. 103 as being unpatentable over Champlin-Scharf et al. (US 2017/0083428A1), in view of Choetkiertikul (Choetkiertikul, M., Dam, H.K., Tran, T., Ghose, A. and Grundy, J., 2017. Predicting delivery capability in iterative software development. IEEE Transactions on Software Engineering, 44(6), pp.551-573.), in further view of Liu et al. (US 2020/0134058).
Regarding claim 1 (Currently Amended), Champlin-Scharf et al. discloses a system for analyzing a user project (Paragraph 0006, Embodiments according to the present invention provide a tool for automatic pre-detection of potential software product impact according to a statement placed in a software development system, and for automatically recommending for resolutions which accesses a repository of information containing a history of changes and effects of the changes for a software project; using a received a statement in natural language to perform a natural language search of the repository; according to the findings of the search of the repository, using a machine learning model to compose an impact prediction regarding the received statement relative to the findings; and automatically placing an advisory notice regarding to the impact prediction into the software development system, wherein the advisory notice is associated with the received statement), comprising: a memory storing computer-readable instructions (Paragraph 0064, The preceding paragraphs have set forth example logical processes according to the present invention, which, when coupled with processing hardware, embody systems according to the present invention, and which, when coupled with tangible, computer readable memory devices, embody computer program products according to the related invention); and at least one processor communicatively coupled to the memory, wherein the computer-readable instructions, when executed by the at least one processor, cause the at least one processor to perform operations (Paragraph 0065, Regarding computers for executing the logical processes set forth herein, it will be readily recognized by those skilled in the art that a variety of computers are suitable and will become suitable as memory, processing, and communication capacities of computers and portable devices increase. In such embodiments, the operative invention includes the combination of programmable computing platform and programs. In other embodiments, some or all of the logical processes may be committed to dedicated or specialized electronic circuitry, such as Application Specific Integrated Circuits or programmable logic devices) to identify a … bottleneck amongst a plurality of teams of a user project (Paragraph 0030, Turning to FIG. 1a, the process of FIG. 1b is annotated to show the different points, at which embodiments of the present invention may be advantageous during traditional software development and project management. Embodiments may implement any one or more of the features in allowing pre-detection and advisory intervention notices responsive to changes in requirements, changes in check-in code, changes in released code, and changes in bug reports or feature requests, as will be discussed in greater detail in the following paragraphs. Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly), comprising: 
accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data … for tracking the user project (Paragraph 0017, the present inventors have set out, in a first advantage of the present invention, to develop an improved software problem discussion search methodology and a tool that “pre-detects” potential coding issues using Natural Language Processing (NLP); Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Examiner notes that the metadata included in an XML format includes a problem posted by the team member (see table in Paragraph 0042, this requirement may cause some incompatibilities with XYZ module));
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of … being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms);
after accessing the first and second sets of metadata, determining an ontology based on the first and second sets of metadata respectively associated with the user project and the plurality of reference projects (Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Applicant states that the ontology unit may include at least one natural language processing model (Paragraph 0034). Based on broadest reasonable interpretation in light of the specification, Champlin-Scharf et al. discloses an ontology as it’s using NLP representing interrelations between the user project and the plurality of reference projects under a framework of categories), the ontology representing interrelations between the user project and the plurality of reference projects under a framework of categories (Paragraph 0026, NLP searching is usually much broader than keyword searching for several reasons. First, it allows the user to express his or her needs in a manner more suitable for the user, and less constrained by system requirements. This increases the likelihood that the search query itself is accurately directed towards the desired information. Second, by extracting the symbols from the natural language input, the search and proceed not only on the symbols, but also on their aliases and synonyms. In the foregoing example, the term “stack” may be searched (a symbol found in the original user's input query), and the synonym “heap” may also be searched according to a synonym list. And, the term “overflow” may be searched as well as an antonym “underflow”. For a user to achieve the same search breadth, he or she would have to have expertise to craft a much more complicated structured query and would have to be diligent enough to look up many synonyms and antonyms, as well as to formulate similarly-meaningful alternative phrases; Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well));
after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project … (Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues);
after determining the subgroup of reference projects, generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Paragraph 0037, Once a repository of defects, comments, and past actions has be created, the system can use Machine Learning to generate models to make suggestions smarter and more accurate);
after determining the assessment model, determining a [machine learning] for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Examiner interprets the likelihood of introducing new defects as the output corresponding to the at least one metric);
after determining the [machine learning], determining a third set of metadata associated with the reference projects in the subgroup (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Examiner notes that the process of training a machine learning includes splitting the data into a training set and a testing set. In this case, the training set is the third set of metadata);
after determining the third set of metadata, training the assessment model using the [machine learning] and the third set of metadata (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code);
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Paragraph 0049, Once the requirement or change to the requirement has been approved, the final content of the requirement will be sent through the pre-detection tool (104, 401, 402) and, once annotated (403), will be used to update the training data (102) for the Machine Learning Models, as illustrated in FIG. 4. This allows the approved intervention to be included in the suggestions for similar requirements change proposed in the future; An ordinary skill in the art knows that the process of training a machine learning model includes a training set and a testing set to evaluate accuracy of the machine learning model. In this case, the training set is the third set of metadata and the testing set is the fourth set of metadata. Further, new data can be used to retrain the machine learning model);
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to generate an assessment outcome reflecting the at least one predicted metric of the user project, the at least one predicted metric reflecting the … bottleneck amongst the plurality of teams of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code);
and generating a … based on the assessment outcome, wherein the … reflects the bottleneck of the user project and comprises a predicted issue of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code; Paragraph 0068, If the computing platform is intended to interact with human users, it is provided with one or more user interface device(s) (607), such as displays, keyboards, pointing devices, speakers, etc.).
Champlin-Scharf et al. discloses accessing a first set of metadata associated with the user project (e.g. problem posted by a user); accessing a second set of metadata associated with a plurality of reference projects (e.g. software repository); determining subgroup of reference projects that are similar to the first set of metadata (e.g. data with similar software problem descriptions); and using the subgroup of data for generating an assessment for assessing at least on metric of the user project (Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks). Although Champlin-Scharf et al. discloses predicting performance bottlenecks based on similar software problem descriptions historical data, Champlin-Scharf et al. does not specifically disclose wherein the system tracks other features that are important for predicting a collaboration bottleneck (e.g. an epic and a plurality of replies to the epic). Also, although Champlin-Scharf et al. discloses machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project, Champlin-Scharf et al. does not specifically disclose wherein the assessment outcome is generated using a neural network. Further, although Champlin-Scharf et al. discloses an assessment outcome reflecting the at least one predicted metric of the user project (Paragraph 0057) and a display (Paragraph 0068), Champlin-Scharf et al. does not specifically disclose generating a report.
However, Choetkiertikul discloses accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data contributed by a plurality of teams of the user project (see Figure 4 and related text in Page 555, Fig. 4 shows an example of an on-going iteration report (recorded in JIRA-Agile) of Mesosphere Sprint 34 in the Apache project. This iteration started from April 27, 2016 to May 11, 2016. This iteration has two issues in the Todo state (MESOS-5272 and MESOS-5222), three issues in the In-progress state—those are all in the reviewing process (MESOS3739, MESOS-4781, and MESOS-4938), and one issue has been resolved (MESOS-5312). These issues have story points assigned to them. For each of those sets of issues, we compute the set cardinality and velocity, and use each of them as a feature. From our investigation, among the under-achieved iterations across all case studies, i.e., velocity(Difference) < 0, 30 percent of them have new issues added after passing 80 percent of their planned duration (e.g., after 8th day of a ten-day iterations), while those iterations deliver zero-issue. Specifically, teams added more velocity(Committed) while velocity(Delivered) was still zero. This reflects that adding and removing issues affects the deliverable capability of an on-going iteration. This can be a good indicator to determine the outcome of an iteration), the first work-tracking data comprising an epic and a plurality of replies to the epic, the plurality of replies being contributed by the plurality of teams for tracking the user project (Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the reviewing process includes a plurality of replies contributed by the plurality of teams);
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of epics and a plurality of replies to the epics, the plurality of replies being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Fig. 3. An overview of our approach; Page 555, 4.2. Features of an Issues, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration. Hence, we also extract a broad range of features representing an issue (see Table 2). The features cover different aspects of an issue including primitive attributes of an issue, issue dependency, changing of issue attributes, and textual features of an issue’s description. Some of the features of an issue (e.g., number of issue links) were also adopted from our previous work [14]; Page 560, 6.1. Data Collecting and Processing, we collected the data of past iterations (also referred to as sprints in those projects) and issues from five large open source projects which follow the agile Scrum methodology: Apache, JBoss, JIRA, MongoDB, and Spring. The project descriptions and their agile adoptions have been reported in Table 5; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the historical data includes a plurality of features, wherein one of the features is the number of comments);
…
after determining the subgroup of reference projects (Fig. 3. An overview of our approach, see feature aggregation; Page 554, 3.0 Overview of our approach, Formally, issue-level features are vectors located in the same euclidean space (i.e., the issue space). The aggregation is then a map of a set points in the issue space onto a point in the iteration space. The main challenge here is to handle sets which are unordered and variable in size (e.g., the number of issues is different from iteration to iteration). We propose two methods: statistical aggregation and bag-of-words (BoW). Statistical aggregation looks for simple set statistics for each dimension of the points in the set, such as maximum, mean or standard deviation. For example, the minimum, maximum, mean, and standard deviation of the number of comments of all issues in an iteration are part of the new features derived for the iteration. This statistical aggregation technique relies on manual feature engineering. On the other hand, the bag-of-words method automatically clusters all the points in the issue-space and finds the closest prototype (known as “word”) for each new point to form a new set of features (known as bag-of-words, similarly to a typical representation of a document) representing an iteration. This technique provides a powerful, automatic way of learning features for an iteration from the set of issues in the layer below it (similar to the notions of deep learning)), generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, Our approach consists of two phases: the learning phase and the execution phase (see Fig. 3). The learning phase involves using historical iterations to build a predictive model (using machine learning techniques), which is then used to predict outcomes, i.e., velocity(Difference), of new and ongoing iterations in the execution phase. To apply machine learning techniques, we need to engineer features for the iteration. An iteration has a number of attributes (e.g., its duration, the participants, etc.) and a set of issues whose dependencies are described as a dependency graph. Each issue has its own attributes and derived features (e.g., from its textual description). Our approach separates the iteration-level features into three components: (i) iteration attributes, (ii) complexity descriptors of the dependency graph (e.g., the number of nodes, edges, fan-in, fan-out, etc.), and (iii) aggregated features from the set of issues that belong to the iteration. A more sophisticated approach would involve embedding all available information into an euclidean space, but we leave this for future work); 
after determining the assessment model, determining a neural network for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, For prediction models, we employ three state-of-the-art randomized ensemble methods: Random Forests, Stochastic Gradient Boosting Machines, and Deep Neural Networks (DNNs) with Dropouts to build the predictive models. Our approach is able to make a prediction regarding the delivery capability in an iteration (i.e., the difference between the actual delivered velocity against the committed (target) velocity). Next we describe our approach in more detail); 
after determining the neural network, determining a third set of metadata associated with the reference projects in the subgroup (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor); 
after determining the third set of metadata, training the assessment model using the neural network and the third set of metadata (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A).
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A; Examiner notes that regularization techniques include splitting the data into a training set and a testing set, wherein the testing set is used to test the model and therefore prevent overfitting, see at least [30-35]); 
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to generate an assessment outcome reflecting the at least one predicted metric of the user project, the at least one predicted metric reflecting the collaboration bottleneck amongst the plurality of teams of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR));
and generating a report based on the assessment outcome, wherein the report reflects the bottleneck of the user project and comprises a predicted issue of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR)).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project and the display of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. epics and a plurality of replies to the epics) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Although Champlin-Scharf et al. discloses searching for similar software problem descriptions by using natural language processing (Paragraphs 0031 and 0035) and Choetkiertikul discloses aggregating similar features by using bag-of-words (Fig. 3. An overview of our approach), the combination of Champlin-Scharf et al. and Choetkiertikul does not specifically disclose wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold. 
However, Liu et al. discloses after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold (Paragraph 0014, In certain embodiments, the step of clustering the data entries further includes: calculating a semantic similarity score for the two data entries using the sentiment similarity values, the text similarity values, and the syntactic similarity values; Figure 3A discloses a current ontology; Paragraph 0126, At procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for a the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the way that the subgroup of reference projects from the plurality of reference projects are determined of the invention of Champlin-Scharf et al. and Choetkiertikul to further incorporate wherein each reference data in the subgroup has a level of relevance to the user project exceeding a threshold of the invention of Liu et al. because doing so would allow the system to group the data entries into clusters, the data entries in the same cluster have high semantic similarity scores (see Liu et al., Paragraph 0126). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 18 (Currently Amended), Champlin-Scharf et al. discloses a method for analyzing a user project (Paragraph 0001, The present invention relates to software development tools and software quality improvement processes; Paragraph 0006, Embodiments according to the present invention provide a tool for automatic pre-detection of potential software product impact according to a statement placed in a software development system, and for automatically recommending for resolutions which accesses a repository of information containing a history of changes and effects of the changes for a software project; using a received a statement in natural language to perform a natural language search of the repository; according to the findings of the search of the repository, using a machine learning model to compose an impact prediction regarding the received statement relative to the findings; and automatically placing an advisory notice regarding to the impact prediction into the software development system, wherein the advisory notice is associated with the received statement), comprising: 
accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data … for tracking the user project (Paragraph 0017, the present inventors have set out, in a first advantage of the present invention, to develop an improved software problem discussion search methodology and a tool that “pre-detects” potential coding issues using Natural Language Processing (NLP); Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Examiner notes that the metadata included in an XML format includes a problem posted by the team member (see table in Paragraph 0042, this requirement may cause some incompatibilities with XYZ module));
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of … being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms);
after accessing the first and second sets of metadata, determining an ontology based on the first and second sets of metadata respectively associated with the user project and the plurality of reference projects (Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Applicant states that the ontology unit may include at least one natural language processing model (Paragraph 0034). Based on broadest reasonable interpretation in light of the specification, Champlin-Scharf et al. discloses an ontology as it’s using NLP representing interrelations between the user project and the plurality of reference projects under a framework of categories), the ontology representing interrelations between the user project and the plurality of reference projects under a framework of categories (Paragraph 0026, NLP searching is usually much broader than keyword searching for several reasons. First, it allows the user to express his or her needs in a manner more suitable for the user, and less constrained by system requirements. This increases the likelihood that the search query itself is accurately directed towards the desired information. Second, by extracting the symbols from the natural language input, the search and proceed not only on the symbols, but also on their aliases and synonyms. In the foregoing example, the term “stack” may be searched (a symbol found in the original user's input query), and the synonym “heap” may also be searched according to a synonym list. And, the term “overflow” may be searched as well as an antonym “underflow”. For a user to achieve the same search breadth, he or she would have to have expertise to craft a much more complicated structured query and would have to be diligent enough to look up many synonyms and antonyms, as well as to formulate similarly-meaningful alternative phrases; Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well));
after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project … (Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues);
after determining the subgroup of reference projects, generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Paragraph 0037, Once a repository of defects, comments, and past actions has be created, the system can use Machine Learning to generate models to make suggestions smarter and more accurate);
after determining the assessment model, determining a [machine learning] for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Examiner interprets the likelihood of introducing new defects as the output corresponding to the at least one metric);
after determining the [machine learning], determining a third set of metadata associated with the reference projects in the subgroup (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Examiner notes that the process of training a machine learning includes splitting the data into a training set and a testing set. In this case, the training set is the third set of metadata);
after determining the third set of metadata, training the assessment model using the [machine learning] and the third set of metadata (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code);
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Paragraph 0049, Once the requirement or change to the requirement has been approved, the final content of the requirement will be sent through the pre-detection tool (104, 401, 402) and, once annotated (403), will be used to update the training data (102) for the Machine Learning Models, as illustrated in FIG. 4. This allows the approved intervention to be included in the suggestions for similar requirements change proposed in the future; An ordinary skill in the art knows that the process of training a machine learning model includes a training set and a testing set to evaluate accuracy of the machine learning model. In this case, the training set is the third set of metadata and the testing set is the fourth set of metadata. Further, new data can be used to retrain the machine learning model);
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to generate an assessment outcome reflecting the at least one predicted metric of the user project, the assessment outcome comprising a bottleneck of the team … of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code);
and generating a … based on the assessment outcome, wherein the … reflects the bottleneck of the user project and comprises a predicted issue of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code; Paragraph 0068, If the computing platform is intended to interact with human users, it is provided with one or more user interface device(s) (607), such as displays, keyboards, pointing devices, speakers, etc.).
Champlin-Scharf et al. discloses accessing a first set of metadata associated with the user project (e.g. problem posted by a user); accessing a second set of metadata associated with a plurality of reference projects (e.g. software repository); determining subgroup of reference projects that are similar to the first set of metadata (e.g. data with similar software problem descriptions); and using the subgroup of data for generating an assessment for assessing at least on metric of the user project (Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks). Although Champlin-Scharf et al. discloses predicting performance bottlenecks based on similar software problem descriptions historical data, Champlin-Scharf et al. does not specifically disclose wherein the system tracks other features that are important for predicting a collaboration bottleneck (e.g. an epic and a plurality of replies to the epic). Also, although Champlin-Scharf et al. discloses machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project, Champlin-Scharf et al. does not specifically disclose wherein the assessment outcome is generated using a neural network. Further, although Champlin-Scharf et al. discloses an assessment outcome reflecting the at least one predicted metric of the user project (Paragraph 0057) and a display (Paragraph 0068), Champlin-Scharf et al. does not specifically disclose generating a report.
However, Choetkiertikul discloses accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data contributed by a plurality of teams of the user project (see Figure 4 and related text in Page 555, Fig. 4 shows an example of an on-going iteration report (recorded in JIRA-Agile) of Mesosphere Sprint 34 in the Apache project. This iteration started from April 27, 2016 to May 11, 2016. This iteration has two issues in the Todo state (MESOS-5272 and MESOS-5222), three issues in the In-progress state—those are all in the reviewing process (MESOS3739, MESOS-4781, and MESOS-4938), and one issue has been resolved (MESOS-5312). These issues have story points assigned to them. For each of those sets of issues, we compute the set cardinality and velocity, and use each of them as a feature. From our investigation, among the under-achieved iterations across all case studies, i.e., velocity(Difference) < 0, 30 percent of them have new issues added after passing 80 percent of their planned duration (e.g., after 8th day of a ten-day iterations), while those iterations deliver zero-issue. Specifically, teams added more velocity(Committed) while velocity(Delivered) was still zero. This reflects that adding and removing issues affects the deliverable capability of an on-going iteration. This can be a good indicator to determine the outcome of an iteration), the first work-tracking data comprising an epic and a plurality of replies to the epic, the plurality of replies being contributed by the plurality of teams for tracking the user project (Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the reviewing process includes a plurality of replies contributed by the plurality of teams);
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of epics and a plurality of replies to the epics, the plurality of replies being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Fig. 3. An overview of our approach; Page 555, 4.2. Features of an Issues, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration. Hence, we also extract a broad range of features representing an issue (see Table 2). The features cover different aspects of an issue including primitive attributes of an issue, issue dependency, changing of issue attributes, and textual features of an issue’s description. Some of the features of an issue (e.g., number of issue links) were also adopted from our previous work [14]; Page 560, 6.1. Data Collecting and Processing, we collected the data of past iterations (also referred to as sprints in those projects) and issues from five large open source projects which follow the agile Scrum methodology: Apache, JBoss, JIRA, MongoDB, and Spring. The project descriptions and their agile adoptions have been reported in Table 5; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the historical data includes a plurality of features, wherein one of the features is the number of comments);
…
after determining the subgroup of reference projects (Fig. 3. An overview of our approach, see feature aggregation; Page 554, 3.0 Overview of our approach, Formally, issue-level features are vectors located in the same euclidean space (i.e., the issue space). The aggregation is then a map of a set points in the issue space onto a point in the iteration space. The main challenge here is to handle sets which are unordered and variable in size (e.g., the number of issues is different from iteration to iteration). We propose two methods: statistical aggregation and bag-of-words (BoW). Statistical aggregation looks for simple set statistics for each dimension of the points in the set, such as maximum, mean or standard deviation. For example, the minimum, maximum, mean, and standard deviation of the number of comments of all issues in an iteration are part of the new features derived for the iteration. This statistical aggregation technique relies on manual feature engineering. On the other hand, the bag-of-words method automatically clusters all the points in the issue-space and finds the closest prototype (known as “word”) for each new point to form a new set of features (known as bag-of-words, similarly to a typical representation of a document) representing an iteration. This technique provides a powerful, automatic way of learning features for an iteration from the set of issues in the layer below it (similar to the notions of deep learning)), generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, Our approach consists of two phases: the learning phase and the execution phase (see Fig. 3). The learning phase involves using historical iterations to build a predictive model (using machine learning techniques), which is then used to predict outcomes, i.e., velocity(Difference), of new and ongoing iterations in the execution phase. To apply machine learning techniques, we need to engineer features for the iteration. An iteration has a number of attributes (e.g., its duration, the participants, etc.) and a set of issues whose dependencies are described as a dependency graph. Each issue has its own attributes and derived features (e.g., from its textual description). Our approach separates the iteration-level features into three components: (i) iteration attributes, (ii) complexity descriptors of the dependency graph (e.g., the number of nodes, edges, fan-in, fan-out, etc.), and (iii) aggregated features from the set of issues that belong to the iteration. A more sophisticated approach would involve embedding all available information into an euclidean space, but we leave this for future work); 
after determining the assessment model, determining a neural network for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, For prediction models, we employ three state-of-the-art randomized ensemble methods: Random Forests, Stochastic Gradient Boosting Machines, and Deep Neural Networks (DNNs) with Dropouts to build the predictive models. Our approach is able to make a prediction regarding the delivery capability in an iteration (i.e., the difference between the actual delivered velocity against the committed (target) velocity). Next we describe our approach in more detail); 
after determining the neural network, determining a third set of metadata associated with the reference projects in the subgroup (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor); 
after determining the third set of metadata, training the assessment model using the neural network and the third set of metadata (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A).
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A; Examiner notes that regularization techniques include splitting the data into a training set and a testing set, wherein the testing set is used to test the model and therefore prevent overfitting, see at least [30-35]); 
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to generate an assessment outcome reflecting the at least one predicted metric of the user project, the assessment outcome comprising a bottleneck of the team collaboration of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR));
and generating a report based on the assessment outcome, wherein the report reflects the bottleneck of the user project and comprises a predicted issue of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR)).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project and the display of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. epics and a plurality of replies to the epics) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Although Champlin-Scharf et al. discloses searching for similar software problem descriptions by using natural language processing (Paragraphs 0031 and 0035) and Choetkiertikul discloses aggregating similar features by using bag-of-words (Fig. 3. An overview of our approach), the combination of Champlin-Scharf et al. and Choetkiertikul does not specifically disclose wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold. 
However, Liu et al. discloses after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold (Paragraph 0014, In certain embodiments, the step of clustering the data entries further includes: calculating a semantic similarity score for the two data entries using the sentiment similarity values, the text similarity values, and the syntactic similarity values; Figure 3A discloses a current ontology; Paragraph 0126, At procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for a the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the way that the subgroup of reference projects from the plurality of reference projects are determined of the invention of Champlin-Scharf et al. and Choetkiertikul to further incorporate wherein each reference data in the subgroup has a level of relevance to the user project exceeding a threshold of the invention of Liu et al. because doing so would allow the system to group the data entries into clusters, the data entries in the same cluster have high semantic similarity scores (see Liu et al., Paragraph 0126). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 34 (Currently Amended), Champlin-Scharf et al. discloses a non-transitory computer-readable medium storing instructions that (Paragraph 0064, The preceding paragraphs have set forth example logical processes according to the present invention, which, when coupled with processing hardware, embody systems according to the present invention, and which, when coupled with tangible, computer readable memory devices, embody computer program products according to the related invention), when executed by at least one processor, cause the at least one processor to (Paragraph 0065, Regarding computers for executing the logical processes set forth herein, it will be readily recognized by those skilled in the art that a variety of computers are suitable and will become suitable as memory, processing, and communication capacities of computers and portable devices increase. In such embodiments, the operative invention includes the combination of programmable computing platform and programs. In other embodiments, some or all of the logical processes may be committed to dedicated or specialized electronic circuitry, such as Application Specific Integrated Circuits or programmable logic devices) perform a method for analyzing user project data (Paragraph 0001, The present invention relates to software development tools and software quality improvement processes; Paragraph 0006, Embodiments according to the present invention provide a tool for automatic pre-detection of potential software product impact according to a statement placed in a software development system, and for automatically recommending for resolutions which accesses a repository of information containing a history of changes and effects of the changes for a software project; using a received a statement in natural language to perform a natural language search of the repository; according to the findings of the search of the repository, using a machine learning model to compose an impact prediction regarding the received statement relative to the findings; and automatically placing an advisory notice regarding to the impact prediction into the software development system, wherein the advisory notice is associated with the received statement), the method comprising: 
accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data … for tracking the user project (Paragraph 0017, the present inventors have set out, in a first advantage of the present invention, to develop an improved software problem discussion search methodology and a tool that “pre-detects” potential coding issues using Natural Language Processing (NLP); Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Examiner notes that the metadata included in an XML format includes a problem posted by the team member (see table in Paragraph 0042, this requirement may cause some incompatibilities with XYZ module));
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of … being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms);
after accessing the first and second sets of metadata, determining an ontology based on the first and second sets of metadata respectively associated with the user project and the plurality of reference projects (Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Applicant states that the ontology unit may include at least one natural language processing model (Paragraph 0034). Based on broadest reasonable interpretation in light of the specification, Champlin-Scharf et al. discloses an ontology as it’s using NLP representing interrelations between the user project and the plurality of reference projects under a framework of categories), the ontology representing interrelations between the user project and the plurality of reference projects under a framework of categories (Paragraph 0026, NLP searching is usually much broader than keyword searching for several reasons. First, it allows the user to express his or her needs in a manner more suitable for the user, and less constrained by system requirements. This increases the likelihood that the search query itself is accurately directed towards the desired information. Second, by extracting the symbols from the natural language input, the search and proceed not only on the symbols, but also on their aliases and synonyms. In the foregoing example, the term “stack” may be searched (a symbol found in the original user's input query), and the synonym “heap” may also be searched according to a synonym list. And, the term “overflow” may be searched as well as an antonym “underflow”. For a user to achieve the same search breadth, he or she would have to have expertise to craft a much more complicated structured query and would have to be diligent enough to look up many synonyms and antonyms, as well as to formulate similarly-meaningful alternative phrases; Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well));
after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project … (Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues);
after determining the subgroup of reference projects, generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues; Paragraph 0037, Once a repository of defects, comments, and past actions has be created, the system can use Machine Learning to generate models to make suggestions smarter and more accurate);
after determining the assessment model, determining a [machine learning] for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Examiner interprets the likelihood of introducing new defects as the output corresponding to the at least one metric);
after determining the [machine learning], determining a third set of metadata associated with the reference projects in the subgroup (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Examiner notes that the process of training a machine learning includes splitting the data into a training set and a testing set. In this case, the training set is the third set of metadata);
after determining the third set of metadata, training the assessment model using the [machine learning] and the third set of metadata (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code);
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Paragraph 0049, Once the requirement or change to the requirement has been approved, the final content of the requirement will be sent through the pre-detection tool (104, 401, 402) and, once annotated (403), will be used to update the training data (102) for the Machine Learning Models, as illustrated in FIG. 4. This allows the approved intervention to be included in the suggestions for similar requirements change proposed in the future; An ordinary skill in the art knows that the process of training a machine learning model includes a training set and a testing set to evaluate accuracy of the machine learning model. In this case, the training set is the third set of metadata and the testing set is the fourth set of metadata. Further, new data can be used to retrain the machine learning model);
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to determine a … bottleneck of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code);
generating an assessment outcome comprising the … bottleneck reflected by the at least one predicted metric of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code)
and generating a … based on the assessment outcome, wherein the … reflects the bottleneck of the user project and comprises a predicted issue of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code; Paragraph 0068, If the computing platform is intended to interact with human users, it is provided with one or more user interface device(s) (607), such as displays, keyboards, pointing devices, speakers, etc.).
Champlin-Scharf et al. discloses accessing a first set of metadata associated with the user project (e.g. problem posted by a user); accessing a second set of metadata associated with a plurality of reference projects (e.g. software repository); determining subgroup of reference projects that are similar to the first set of metadata (e.g. data with similar software problem descriptions); and using the subgroup of data for generating an assessment for assessing at least on metric of the user project (Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks). Although Champlin-Scharf et al. discloses predicting performance bottlenecks based on similar software problem descriptions historical data, Champlin-Scharf et al. does not specifically disclose wherein the system tracks other features that are important for predicting a collaboration bottleneck (e.g. an epic and a plurality of replies to the epic). Also, although Champlin-Scharf et al. discloses machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project, Champlin-Scharf et al. does not specifically disclose wherein the assessment outcome is generated using a neural network. Further, although Champlin-Scharf et al. discloses an assessment outcome reflecting the at least one predicted metric of the user project (Paragraph 0057) and a display (Paragraph 0068), Champlin-Scharf et al. does not specifically disclose generating a report.
However, Choetkiertikul discloses accessing a first set of metadata associated with the user project, the first set of metadata comprising first work-tracking data contributed by a plurality of teams of the user project (see Figure 4 and related text in Page 555, Fig. 4 shows an example of an on-going iteration report (recorded in JIRA-Agile) of Mesosphere Sprint 34 in the Apache project. This iteration started from April 27, 2016 to May 11, 2016. This iteration has two issues in the Todo state (MESOS-5272 and MESOS-5222), three issues in the In-progress state—those are all in the reviewing process (MESOS3739, MESOS-4781, and MESOS-4938), and one issue has been resolved (MESOS-5312). These issues have story points assigned to them. For each of those sets of issues, we compute the set cardinality and velocity, and use each of them as a feature. From our investigation, among the under-achieved iterations across all case studies, i.e., velocity(Difference) < 0, 30 percent of them have new issues added after passing 80 percent of their planned duration (e.g., after 8th day of a ten-day iterations), while those iterations deliver zero-issue. Specifically, teams added more velocity(Committed) while velocity(Delivered) was still zero. This reflects that adding and removing issues affects the deliverable capability of an on-going iteration. This can be a good indicator to determine the outcome of an iteration), the first work-tracking data comprising an epic and a plurality of replies to the epic, the plurality of replies being contributed by the plurality of teams for tracking the user project (Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the reviewing process includes a plurality of replies contributed by the plurality of teams);
accessing a second set of metadata associated with a plurality of reference projects, the plurality of reference projects being distinct from the user project, the second set of metadata comprising second work-tracking data comprising a plurality of epics and a plurality of replies to the epics, the plurality of replies being contributed by a plurality of teams in each of the plurality of reference projects for tracking the reference projects (Fig. 3. An overview of our approach; Page 555, 4.2. Features of an Issues, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration. Hence, we also extract a broad range of features representing an issue (see Table 2). The features cover different aspects of an issue including primitive attributes of an issue, issue dependency, changing of issue attributes, and textual features of an issue’s description. Some of the features of an issue (e.g., number of issue links) were also adopted from our previous work [14]; Page 560, 6.1. Data Collecting and Processing, we collected the data of past iterations (also referred to as sprints in those projects) and issues from five large open source projects which follow the agile Scrum methodology: Apache, JBoss, JIRA, MongoDB, and Spring. The project descriptions and their agile adoptions have been reported in Table 5; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the historical data includes a plurality of features, wherein one of the features is the number of comments);
…
after determining the subgroup of reference projects (Fig. 3. An overview of our approach, see feature aggregation; Page 554, 3.0 Overview of our approach, Formally, issue-level features are vectors located in the same euclidean space (i.e., the issue space). The aggregation is then a map of a set points in the issue space onto a point in the iteration space. The main challenge here is to handle sets which are unordered and variable in size (e.g., the number of issues is different from iteration to iteration). We propose two methods: statistical aggregation and bag-of-words (BoW). Statistical aggregation looks for simple set statistics for each dimension of the points in the set, such as maximum, mean or standard deviation. For example, the minimum, maximum, mean, and standard deviation of the number of comments of all issues in an iteration are part of the new features derived for the iteration. This statistical aggregation technique relies on manual feature engineering. On the other hand, the bag-of-words method automatically clusters all the points in the issue-space and finds the closest prototype (known as “word”) for each new point to form a new set of features (known as bag-of-words, similarly to a typical representation of a document) representing an iteration. This technique provides a powerful, automatic way of learning features for an iteration from the set of issues in the layer below it (similar to the notions of deep learning)), generating an assessment model for assessing at least one metric of the user project based on the subgroup of reference projects (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, Our approach consists of two phases: the learning phase and the execution phase (see Fig. 3). The learning phase involves using historical iterations to build a predictive model (using machine learning techniques), which is then used to predict outcomes, i.e., velocity(Difference), of new and ongoing iterations in the execution phase. To apply machine learning techniques, we need to engineer features for the iteration. An iteration has a number of attributes (e.g., its duration, the participants, etc.) and a set of issues whose dependencies are described as a dependency graph. Each issue has its own attributes and derived features (e.g., from its textual description). Our approach separates the iteration-level features into three components: (i) iteration attributes, (ii) complexity descriptors of the dependency graph (e.g., the number of nodes, edges, fan-in, fan-out, etc.), and (iii) aggregated features from the set of issues that belong to the iteration. A more sophisticated approach would involve embedding all available information into an euclidean space, but we leave this for future work); 
after determining the assessment model, determining a neural network for training the assessment model based on an output of the assessment model, the output corresponding to the at least one metric (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, For prediction models, we employ three state-of-the-art randomized ensemble methods: Random Forests, Stochastic Gradient Boosting Machines, and Deep Neural Networks (DNNs) with Dropouts to build the predictive models. Our approach is able to make a prediction regarding the delivery capability in an iteration (i.e., the difference between the actual delivered velocity against the committed (target) velocity). Next we describe our approach in more detail); 
after determining the neural network, determining a third set of metadata associated with the reference projects in the subgroup (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor); 
after determining the third set of metadata, training the assessment model using the neural network and the third set of metadata (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A).
after training the assessment model, testing the assessment model using a fourth set of metadata associated with the reference projects in the subgroup, the third set of metadata and the fourth set of metadata being non-overlapping to each other (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A; Examiner notes that regularization techniques include splitting the data into a training set and a testing set, wherein the testing set is used to test the model and therefore prevent overfitting, see at least [30-35]); 
and after the training and testing of the assessment model, applying the assessment model to the first set of metadata to determine a collaboration bottleneck of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR));
generating an assessment outcome comprising the collaboration bottleneck reflected by the at least one predicted metric of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR));
and generating a report based on the assessment outcome, wherein the report reflects the bottleneck of the user project and comprises a predicted issue of the user project (Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR)).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project and the display of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. epics and a plurality of replies to the epics) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Although Champlin-Scharf et al. discloses searching for similar software problem descriptions by using natural language processing (Paragraphs 0031 and 0035) and Choetkiertikul discloses aggregating similar features by using bag-of-words (Fig. 3. An overview of our approach), the combination of Champlin-Scharf et al. and Choetkiertikul does not specifically disclose wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold. 
However, Liu et al. discloses after determining the ontology, determining, based on the ontology, a subgroup of reference projects from the plurality of reference projects, wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold (Paragraph 0014, In certain embodiments, the step of clustering the data entries further includes: calculating a semantic similarity score for the two data entries using the sentiment similarity values, the text similarity values, and the syntactic similarity values; Figure 3A discloses a current ontology; Paragraph 0126, At procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for a the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the way that the subgroup of reference projects from the plurality of reference projects are determined of the invention of Champlin-Scharf et al. and Choetkiertikul to further incorporate wherein each reference data in the subgroup has a level of relevance to the user project exceeding a threshold of the invention of Liu et al. because doing so would allow the system to group the data entries into clusters, the data entries in the same cluster have high semantic similarity scores (see Liu et al., Paragraph 0126). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. 
Regarding claim 2 (Original), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the operations comprise at least one of: 
extracting the first set of metadata from the user project after receiving the user project; or obtaining the first set of metadata associated with the user project from a software repository (Paragraph 0017, the present inventors have set out, in a first advantage of the present invention, to develop an improved software problem discussion search methodology and a tool that “pre-detects” potential coding issues using Natural Language Processing (NLP); Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms).
Regarding claim 3 (Currently Amended), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the operations comprise at least one of: 
obtaining the plurality of reference projects from the plurality of epics from one or more collaboration tools and extracting the second set of metadata from the obtained plurality of reference projects, the plurality of epics each comprising a plurality of issues and a plurality of replies to the plurality of issues; or obtaining the second set of metadata associated with the plurality of reference projects from one or more collaboration tools (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Examiner interprets the software version control system as the one or more collaboration tools. It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “obtaining the second set of metadata associated with the plurality of reference projects from one or more collaboration tools.").
Regarding claim 6 (Previously Presented), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the report further comprises at least one of: a predicted success rate of the user project; a predicted time duration of the user project; a predicted programming defect of the user project; or a recommendation to improve the user project (Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “a predicted programming defect of the user project”).
Regarding claim 8 (Currently Amended), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein: the first set of metadata associated with the user project comprises processed data based on raw data of the user project; and the second set of metadata associated with the plurality of reference projects comprises processed data based on raw data of the plurality of reference projects, and the operations comprise: extracting text data from the user project and the plurality of reference projects (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms);
determining numerical representations of the text data (Paragraph 0026, NLP searching is usually much broader than keyword searching for several reasons. First, it allows the user to express his or her needs in a manner more suitable for the user, and less constrained by system requirements. This increases the likelihood that the search query itself is accurately directed towards the desired information. Second, by extracting the symbols from the natural language input, the search and proceed not only on the symbols, but also on their aliases and synonyms. In the foregoing example, the term “stack” may be searched (a symbol found in the original user's input query), and the synonym “heap” may also be searched according to a synonym list. And, the term “overflow” may be searched as well as an antonym “underflow”. For a user to achieve the same search breadth, he or she would have to have expertise to craft a much more complicated structured query and would have to be diligent enough to look up many synonyms and antonyms, as well as to formulate similarly-meaningful alternative phrases; Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well));
	determining a natural language processing (NLP) model based on the first set of metadata, the second set of metadata, and the numerical representations (Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues); 
and determining the ontology using the NLP model that comprises machine learning (Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well); Paragraph 0029, Natural language searching can be further enhanced by the addition of language models which employ Artificial Intelligence (AI) techniques so that their performance converges on a desired outcome throughout usage. For example, if in a first search, a user or administrator marks certain found items as “better” and certain other found items as “worse”, and AI engine can then adjust certain parametric weights to the NLP methods and processes to favor the rules and methods which generated the “better” outputs and to disfavor the rules and methods which generated the “worse” outputs. In a second search, the use may again mark some results as better and others as worse, and the AI engine can further tune the parametric weights, and so forth. Over time and usage, the NLP searching will become more and more accurate at finding and outputting the kind of results the user or administrator desires by “learning” the user's or administrator's preferences).
Champlin-Scharf et al. discloses all the limitations above and the use of a semantic analyzer to determine the meaning of a phrase (Paragraph 0026). Although it is known that the semantic analysis includes “determining numerical representations of the text data”, the combination of Champlin-Scharf et al. and Trammell et al. does not specifically disclose the specifics of how the semantic analyzer is determining numerical representations of the text data.
However, Liu et al. further discloses extracting text data from the … and the … (Paragraph 0076, The similarity calculator 126 is configured to, upon receiving the tokenized data entries, determine similarity between each pair of cleaned text based on sentence embedding. Here clean texts from any two of the data entries form a pair); 
determining numerical representations of the text data (Paragraph 0076, The similarity calculator 126 is configured to, upon receiving the tokenized data entries, determine similarity between each pair of cleaned text based on sentence embedding. Here clean texts from any two of the data entries form a pair. Each clean text of a data entry is represented by a vector. In certain embodiments, the word representation in vector space uses the method described by Mikolov, Thomas et al. (Mikolove, Tomas et al, efficient estimation of word representation in vector space, 2013, arxiv:1301.3781v3), which is incorporated herein by reference in its entirety. Through word embedding, the words in a cleaned text are mapped to vectors of real numbers. The vectors of one data entry text in the pair and the vector of the other data entry text in the pair are compared to determine a similarity or distance between them); 
determining a natural language processing (NLP) model based on the first set of metadata, the second set of metadata, and the numerical representations (Paragraph 0077, The NLP 128 is configured to, upon receiving the tokenized data entries, determine the syntactic structure of the text by analyzing its constituent words based on an underlying grammar. In certain embodiments, the syntactic features are part-of-speech tags. In certain embodiments, a pretrained model, for example, the Stanford parser (https://nlp.stanford.edu/software/lex-parser.shtml) is used, which is incorporated herein by reference in its entirety. In certain embodiments, the NLP 128 is further configured to process the initial parser output to provide certain statistic result. For example, after syntactic parsing, the NPL parser 128 may further count the number of nouns and the number of verbs in the output. When a data entry has a result of 3 and 1, the text of the data entry includes 3 nouns and 1 verb. This simple yet novel character of the text is useful for the following accurate ontology construction and update); 
and determining the ontology using the NLP model that comprises machine learning (Paragraph 0063, An ontology optimization module is designed to maintain and adjust the semantic relations between the concepts, and start the training process of the machine learning models according to analysis results and verification; Paragraph 0077, The NLP 128 is configured to, upon receiving the tokenized data entries, determine the syntactic structure of the text by analyzing its constituent words based on an underlying grammar. In certain embodiments, the syntactic features are part-of-speech tags. In certain embodiments, a pretrained model, for example, the Stanford parser (https://nlp.stanford.edu/software/lex-parser.shtml) is used, which is incorporated herein by reference in its entirety. In certain embodiments, the NLP 128 is further configured to process the initial parser output to provide certain statistic result. For example, after syntactic parsing, the NPL parser 128 may further count the number of nouns and the number of verbs in the output. When a data entry has a result of 3 and 1, the text of the data entry includes 3 nouns and 1 verb. This simple yet novel character of the text is useful for the following accurate ontology construction and update).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the natural language processing model for determining similarities between text data from the user project and the plurality of reference projects, which includes a semantic analyzer of the invention of Champlin-Scharf et al. to further specify how the semantic analyzer is determining numerical representations of the text data of the invention of Liu et al. because doing so would allow the system to determine similarity score between each pair of cleaned text based on sentence embedding (see Liu et al., Paragraphs 0076 & 0109). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 9 (Previously Presented), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the operations comprise: updating the plurality of reference projects; determining the second set of metadata after updating the plurality of reference projects; updating the … based on at least the first and second sets of metadata respectively associated with the user project and plurality of updated reference projects; and determining, based on updated …, the subgroup of reference projects from the plurality of updated reference projects prior to training the assessment model (Paragraph 0027, Therefore, for the purposes of this disclosure, natural language searching shall mean receiving a natural language expression as an search query input, applying one or more NLP techniques including deductive logic, inductive logic, validity and soundness checks, rules of though (e.g., principle of contradiction, law of excluded middle, etc.), truth functionalities (e.g. Modus Ponens, Modus Tollens, hypothetical syllogism, denying an antecedent, affirming a consequent, etc.), predicate logic, sorites arguments, ethymemes, syntactic analysis, semantic analysis, and pragmatics. Syntactic analysis may contain one or more parsers, such as noise-disposal parsers, state-machine parsers, and definite clause parsers, and it may include pre-determined and updateable grammars (e.g. recognizable grammar structures), such as context-free notations (e.g. Backus-Naur forms, etc.). A semantic analyzer may, based on the results of the syntactic analysis or interactively operating with syntactic analysis, determines the meaning of a phrase, statement or sentence. Most semantic analyzers attempt to re-write the phrase, statement or sentence into a context-free form so that it can be more readily found in a lexicon and mapped to an intended or implicit meaning. Pragmatics then operates to further reduce remaining ambiguities by applying reasonable domain scope, resolving anaphoras, and using inferencing to generate alternative expressions for the same meaning (upon which the search can be performed, as well); Paragraph 0049, Once the requirement or change to the requirement has been approved, the final content of the requirement will be sent through the pre-detection tool (104, 401, 402) and, once annotated (403), will be used to update the training data (102) for the Machine Learning Models, as illustrated in FIG. 4. This allows the approved intervention to be included in the Suggestions for similar requirements change proposed in the future; Examiner notes that ontologies of the grammar structure can be updated by adding or deleting synonyms).
	Although Champlin-Scharf et al. discloses all the limitations above, updating the plurality of reference projects, and updatable grammars, the combination of Champlin-Scharf et al. and Trammell et al. does not specifically disclose updating the ontology based on at least the first and second sets of metadata.
	However, Liu et al. discloses wherein the operations comprise: updating the plurality of reference … (Paragraph 0082, The new concept verifier 140 is configured to, retrieve new themes from the new theme database 194, and verify if any of the new themes are new concept. Here we define the new themes as recognized topics detected from the recent data stream, that is, the clusters detected by the cluster classifier 132, while define the new concept as verified new themes. In other words, the new themes are candidates for new concepts, and the new concepts are verified new themes. The verified new themes then can be used to update the ontology. As shown in FIG. 2B, the new concept verifier 140 includes a new theme retrieving module 142, a near duplicate identification module 144, a concept comparing module 146, a concept proposing module 148, and a concept verification module 150); 
determining the second set of metadata after updating the plurality of reference … (Paragraph 0082, The new concept verifier 140 is configured to, retrieve new themes from the new theme database 194, and verify if any of the new themes are new concept. Here we define the new themes as recognized topics detected from the recent data stream, that is, the clusters detected by the cluster classifier 132, while define the new concept as verified new themes. In other words, the new themes are candidates for new concepts, and the new concepts are verified new themes. The verified new themes then can be used to update the ontology. As shown in FIG. 2B, the new concept verifier 140 includes a new theme retrieving module 142, a near duplicate identification module 144, a concept comparing module 146, a concept proposing module 148, and a concept verification module 150); 
updating the ontology based on at least the first and second sets of metadata respectively associated with the user … and plurality of updated reference … (Paragraph 0082, The new concept verifier 140 is configured to, retrieve new themes from the new theme database 194, and verify if any of the new themes are new concept. Here we define the new themes as recognized topics detected from the recent data stream, that is, the clusters detected by the cluster classifier 132, while define the new concept as verified new themes. In other words, the new themes are candidates for new concepts, and the new concepts are verified new themes. The verified new themes then can be used to update the ontology. As shown in FIG. 2B, the new concept verifier 140 includes a new theme retrieving module 142, a near duplicate identification module 144, a concept comparing module 146, a concept proposing module 148, and a concept verification module 150); 
and determining, based on updated ontology, the subgroup of reference projects from the plurality of updated reference projects … (Paragraph 0109, As shown in FIG. 4, the user generated data 190 is provided to or retrieved by the emerging theme detector 120. The user generated data 190 may include a large amount of historical data, and the emerging theme detector 120 may only process a batch of data at a time, such as the user feedbacks in an e-commerce website in the past week. The emerging detector 120 then processes the batch of the user generated data that include many data entries, to obtain relationships between any two of the data entries. The relationship may be represented by a semantic similarity score, where the higher the score, the more similar the two data entries. Based on the semantic similarity scores, the emerging theme detector 120 clusters the data entries into different groups. The data entries in the same group have high semantic similarity score between each other. The groups are regarded as new emerging themes. In certain embodiments, the emerging theme detector 120 may also use a threshold to filter the groups, and only the groups that have a number of data entries greater than the threshold number, such as 50 or 60, are regarded as the new emerging themes. In certain embodiments, the emerging theme detector 120 further compares the detected new themes with the new themes detected in the older time, such as in the three weeks previous to the passing week, and keep only the new themes that are not shown in those previous three weeks. The emerging theme detector 120 then sends the detected new themes to the new concept verifier 140; Paragraph 0110, The new concept verifier 140, upon receiving the new themes, compares the new themes with the nodes in the ontology, where each node in the ontology represent a concept. The new concept verifier 140 calculates the novelty score of each new themes by comparing the similarity between each of the new themes and each of the concepts. The novelty score may be computed using a set of classification models. The new concept verifier 140 defines the new themes having the high novelty scores as verified new concepts or simply verified concepts. The new concept verifier 140 then sends the verified concepts to the ontology adjusting module 160).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the subgroup of reference projects from the plurality of reference projects prior to training the assessment model of the invention of Champlin-Scharf et al. to further incorporate wherein the subgroup is updated based on an updated ontology of the invention of Liu et al. because doing so would allow the system to cluster data entries into different groups based on an updated ontology (see Liu et al., Paragraphs 0109-0110). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 11 (Previously Presented), which is dependent of claim 9, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 9. Although Champlin-Scharf et al. discloses wherein determining, based on the ontology, the subgroup of reference projects comprises determining, from the framework of categories, a subset of categories associated with the user project based on the ontology (Paragraph 0017, the present inventors have set out, in a first advantage of the present invention, to develop an improved software problem discussion search methodology and a tool that “pre-detects” potential coding issues using Natural Language Processing (NLP); Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms), determining the subgroup of reference projects comprising: 
tagging text data of the predicted issue in the first set of metadata and text data of the reference projects in the second set of metadata to determine the plurality of categories associated with the reference project that matches one or more categories in the subset of categories associated with the user project (Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues); 
determining, for each of the plurality of reference projects, a corresponding level of relevance based on the number of categories; selecting one or more reference projects to be included in the subgroup … (Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues);
and selecting one or more reference projects sharing a same subset of categories to be included in the subgroup, a number of the reference projects in the subgroup being smaller than a total number of the reference projects (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms). 
Although Champlin-Scharf et al. discloses natural language processing to search for similar software problem descriptions (Paragraphs 0031 and 0035) based on an ontology (see Paragraphs 0026-0027, using a syntactic, semantic, and pragmatics), Champlin-Scharf et al. does not specifically disclose wherein each reference project in the subgroup has a level of relevance to the user project exceeding a threshold.
However, Liu et al. discloses wherein determining, based on the ontology, the subgroup of reference … comprises determining, from the framework of categories, a subset of categories associated with the user … based on the ontology (Paragraph 0014, In certain embodiments, the step of clustering the data entries further includes: calculating a semantic similarity score for the two data entries using the sentiment similarity values, the text similarity values, and the syntactic similarity values; Figure 3A discloses a current ontology), determining the subgroup of reference projects comprising: 
tagging text data of the … in the first set of metadata and text data of the reference projects in the second set of metadata to determine the plurality of categories associated with the reference … that matches one or more categories in the subset of categories associated with the user … (Paragraph 0121, At procedure 510, the NLP 128, upon receiving the cleaned and tokenized data entries (text), determines the syntactic structure of the text by analyzing its constituent words based on an underlying grammar. In certain embodiments, the NLP 128 uses part-of-speech tagging. In certain embodiments, the NLP 128 evaluate the syntactic or grammar complexity of the data entry, and represents the complexity as a real number. After obtaining a number for each cleaned and tokenized data entry, the NLP 128 sends the numbers to the semantic scorer 130; Paragraph 0123, At procedure 512, the semantic scorer 130, upon receiving the sentiment polarity of each of the data entries from the sentiment analyzer 124, the similarity scores between any two of the data entries from the similarity calculator 126, and the NLP score of each of the data entries, calculates the semantic similarity score for each pair of data entries, i.e., for any two of the data entries); 
determining, for each of the plurality of reference …, a corresponding level of relevance based on the number of categories (Paragraph 0123, At procedure 512, the semantic scorer 130, upon receiving the sentiment polarity of each of the data entries from the sentiment analyzer 124, the similarity scores between any two of the data entries from the similarity calculator 126, and the NLP score of each of the data entries, calculates the semantic similarity score for each pair of data entries, i.e., for any two of the data entries); 
selecting one or more reference … to be included in the subgroup when the corresponding levels of relevance exceed the threshold (Paragraph 0126, At procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for a the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score);
and selecting one or more reference … sharing a same subset of categories to be included in the subgroup, a number of the reference … in the subgroup being smaller than a total number of the reference … (Paragraph 0126, At procedure 514, the cluster classifier 132, upon receiving the semantic similarity scores between each pair (any two) of the data entries, classifies the data entries based on the semantic similarity scores. Specifically, the cluster classifier 132 groups the data entries into clusters, the data entries in the same cluster have high semantic similarity scores. In certain embodiments, a threshold is defined for a the clusters, which means that any two data entries in the same cluster has the semantic similarity score greater than the threshold score).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the way that the subgroup of reference projects from the plurality of reference projects are determined of the invention of Champlin-Scharf et al.  to further incorporate wherein each reference data in the subgroup has a level of relevance to the user project exceeding a threshold of the invention of Liu et al. because doing so would allow the system to group the data entries into clusters, the data entries in the same cluster have high semantic similarity scores (see Liu et al., Paragraph 0126). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 13 (Previously Presented), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein generating the assessment model comprises: determining parameters of a function between the third set of metadata and the at least one metric using the [machine learning] (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Examiner interprets the likelihood of introducing new defects as the output corresponding to the at least one metric).
Although Champlin-Scharf et al. discloses machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project, the combination of Champlin-Scharf et al. and Liu et al. does not specifically disclose wherein the assessment outcome is generated using a neural network.
However, Choetkiertikul discloses wherein generating the assessment model comprises: determining parameters of a function between the third set of metadata and the at least one metric using the neural network (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, For prediction models, we employ three state-of-the-art randomized ensemble methods: Random Forests, Stochastic Gradient Boosting Machines, and Deep Neural Networks (DNNs) with Dropouts to build the predictive models. Our approach is able to make a prediction regarding the delivery capability in an iteration (i.e., the difference between the actual delivered velocity against the committed (target) velocity). Next we describe our approach in more detail).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 14 (Original), which is dependent of claim 13, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 13. Champlin-Scharf et al. further discloses wherein: the third set of metadata comprises at least one of: a name of a user who opens an issue in at least one of the reference projects in the subgroup; a number of comments associated with the issue; a number of users contributing to the issue; a percentage of open tickets in the issue; a percentage of closed tickets in the issue; a number of comments closed per period of time; a capacity of the users contributing to the issue; a number of data sources a same issue is logged in; a length of description of each ticket; or a time of completion for each ticket (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues); It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “a number of data sources a same issue is logged in"); 
and the at least one metric comprises at least one of a success rate or a project completion duration (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “a success rate").
Regarding claim 41 (New), which is dependent of claim 14, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 14. Champlin-Scharf et al. further discloses wherein: the third set of metadata comprises each one of: a percentage of open tickets in the issue; a percentage of closed tickets in the issue; a number of comments closed per period of time; a capacity of the users contributing to the issue; a number of data sources a same issue is logged in; a length of description of each ticket; or a time of completion for each ticket (Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; Paragraph 0031, search for similar software problem descriptions; Paragraph 0035, For example, a user could input into the enhanced bug tracking systems user interface that they are planning to add a hashmap to a software design or product. Such user input might include a natural language description of what functions would use the hashmap, and what data structures the hashmap would relate to each other, and to which particular method or code module the hashmap would be added. Such as com.my.package; Paragraph 0036, The resulting NLP-based search might find instances in the past, in which hashmaps were used in similar circumstances and caused performance issues); It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “a number of data sources a same issue is logged in"); 
and the at least one metric comprises at least one of a success rate or a project completion duration (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly; It can be noted that the claim language is written in alternative form. The limitation taught by Champlin-Scharf et al. is based on “a success rate").
Regarding claim 16 (Original), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the operations comprise: 
determining an explainability of the assessment model with respect to the at least one metric; evaluating the at least one metric based on the explainability and the assessment outcome; and determining a recommendation based on the evaluation of the at least one metric (Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0047, An added advisory entry to a discussion thread might appear as follows: “ADVISORY: The bug pre-detector tool has found similar changes that resulted in one or more bug(s). Description(s) of the bug(s) and the corrective action(s) can be found at the following hyperlinks).
Regarding claim 37 (Previously Presented), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses training the assessment model using at least one of a regression method or a tree-based method (Paragraph 0041, Before the bug pre-detection method and tool of the present invention is used by a user, training data will be ingested by the system originating from a variety of sources. This would likely include, but not limited to, defect tracking databases, requirements documents, work items, and test cases generated during prior releases, as well as comments from known defect-free sections of code; Paragraph 0046, Based on previous knowledge in the form of our trained models (102) and, if available, previous bug resolution notes (104), the bug pre-detection tool will offer suggestions in the form of automatically-generated discussion comments (301) or other suitable notices (e-mails, text messages, etc.) as to whether or not the proposals in the latest revision of the document will improve software quality in terms of the likelihood of introducing new defects, or triggering previously solved defects, which is illustrated in FIG. 3; Paragraph 0057, As the work item is delivered and built into a software deliverable, the next phase of some embodiments of the present invention may be engaged. As the code has been previously run through the bug pre-detection tool, the Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly).
Although the combination of Champlin-Scharf et al. discloses machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project, the combination of Champlin-Scharf et al. and Liu et al. does not specifically disclose wherein the assessment outcome is generated using at least one of a regression method or a tree-based method.
However, Choetkiertikul further comprising training the assessment model using at least one of a regression method or a tree-based method (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 559, 5 Predictive Models, Our predictive models can predict the difference between the actual delivered velocity against the committed (target) velocity for an iteration, i.e., velocity(Difference). To do so, we employ regression methods (supervised learning) where the outputs reflect the deliverable capability in an iteration e.g., the predicting of velocity(Difference) will be equal to 12. The extracted features of the historical iterations (i.e., training set) are used to build the predictive models. Specifically, a feature vector of an iteration and an aggregated feature vector of issues assigned to the iteration are concatenated and fed into a regressor.  We apply the currently most successful class of machine learning methods, namely randomized ensemble methods [26], [27], [28]. Ensemble methods refer to the use of many regressors to make their prediction [28]. Randomized methods create regressors by randomizing data, features, or internal model components [29]. Randomizations are powerful regularization techniques which reduce prediction variance, prevent overfitting, are robust against noisy data, and improve the overall predictive accuracy [30], [31]. We use the following high performing regressors that have frequently won recent data science competitions (e.g., Kaggle6 ): Random Forests (RFs) [32], Stochastic Gradient Boosting Machines (GBMs) [33], [34] and Deep Neural Networks with Dropouts (DNNs) [35]. All of them are ensemble methods that use a divide-and-conquer approach to improve performance. The key principle behind ensemble methods is that a group of “weak learners” (e.g., classification and regression decision trees) can together form a “strong learner”. Details for these predictive models used are provided in Appendix A).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a regression method or a tree-based method of the invention of Choetkiertikul because those approaches are very well known techniques for predicting project outcomes. Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable.
Regarding claim 39 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. discloses accessing a first set of metadata associated with the user project (e.g. problem posted by a user); accessing a second set of metadata associated with a plurality of reference projects (e.g. software repository); determining subgroup of reference projects that are similar to the first set of metadata (e.g. data with similar software problem descriptions); and using the subgroup of data for generating an assessment for assessing at least on metric of the user project (Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks). Although Champlin-Scharf et al. discloses predicting performance bottlenecks based on similar software problem descriptions historical data, Champlin-Scharf et al. does not specifically disclose wherein the system tracks other features that are important for predicting a collaboration bottleneck (e.g. an epic and a plurality of replies to the epic).
	However, Choetkiertikul discloses wherein: the first set of metadata comprises data of team collaboration in the replies amongst the teams in the user project ((see Figure 4 and related text in Page 555, Fig. 4 shows an example of an on-going iteration report (recorded in JIRA-Agile) of Mesosphere Sprint 34 in the Apache project. This iteration started from April 27, 2016 to May 11, 2016. This iteration has two issues in the Todo state (MESOS-5272 and MESOS-5222), three issues in the In-progress state—those are all in the reviewing process (MESOS3739, MESOS-4781, and MESOS-4938), and one issue has been resolved (MESOS-5312). These issues have story points assigned to them. For each of those sets of issues, we compute the set cardinality and velocity, and use each of them as a feature. From our investigation, among the under-achieved iterations across all case studies, i.e., velocity(Difference) < 0, 30 percent of them have new issues added after passing 80 percent of their planned duration (e.g., after 8th day of a ten-day iterations), while those iterations deliver zero-issue. Specifically, teams added more velocity(Committed) while velocity(Delivered) was still zero. This reflects that adding and removing issues affects the deliverable capability of an on-going iteration. This can be a good indicator to determine the outcome of an iteration; Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the reviewing process includes a plurality of replies contributed by the plurality of teams);
and the second set of metadata comprises data of team collaboration in the replies amongst the teams in the reference projects (Fig. 3. An overview of our approach; Page 555, 4.2. Features of an Issues, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration. Hence, we also extract a broad range of features representing an issue (see Table 2). The features cover different aspects of an issue including primitive attributes of an issue, issue dependency, changing of issue attributes, and textual features of an issue’s description. Some of the features of an issue (e.g., number of issue links) were also adopted from our previous work [14]; Page 560, 6.1. Data Collecting and Processing, we collected the data of past iterations (also referred to as sprints in those projects) and issues from five large open source projects which follow the agile Scrum methodology: Apache, JBoss, JIRA, MongoDB, and Spring. The project descriptions and their agile adoptions have been reported in Table 5; Examiner interprets the set of issues included in a sprint as epics. Further, Examiner notes that the historical data includes a plurality of features, wherein one of the features is the number of comments).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project and the display of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. epics and a plurality of replies to the epics) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Regarding claim 40 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Although Champlin-Scharf et al. discloses one or more collaboration tools (e.g. software version control), Champlin-Scharf et al. does not specifically disclose wherein: the one or more collaboration tools comprise at least one or Github, JIRA, Slack, or Microsoft Project.
	However, Choetkiertikul discloses wherein: the one or more collaboration tools comprise at least one or Github, JIRA, Slack, or Microsoft Project (Page 560, 6.1 Data Collecting and Processing, We collected the data of past iterations (also referred to as sprints in those projects) and issues from five large open source projects which follow the agile Scrum methodology: Apache, JBoss, JIRA, MongoDB, and Spring. The project descriptions and their agile adoptions have been reported in Table 5).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project by using a software version control collaboration tool to further incorporate other known collaboration tools of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase based on past iterations data from open source projects including JIRA (see Choetkiertikul, Page 554, 3 Overview of our approach; and Page 560, 6.1 Data Collecting and Processing). Further, the claimed invention is merely a combination of old elements, and in combination each element would have performed the same function as it did separately, and one of ordinary skill in the art would have recognized that the results of the combination were predictable. 
Regarding claim 42 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein each of the plurality of issues comprises one or more problems posted by a team member (Paragraph 0016, Such events are usually discussed in “bug” tracking tools, software development planning tools, developer code comments, and other solutions. However, the present inventors have realized that the current search tools are inadequate to find problem reports, which discuss the same root problem but may use different terminology, situations and examples to do so; Paragraph 0032, Prior to processing any search for similar software problem descriptions, a repository of textual descriptions of a history of changes to the software product would be created by gathering information (or a collection of links to information) from different sources such as from bug tracking databases, extracted comments in source code, and “one liner” comments captured in software version control systems; Paragraph 0033, After the repository has been created, a user would input a description of the changes he or she is going to make to the software source code into the comments of an enhanced bug tracking tool, then he or she would click a button, which would scan the repository while searching for similar changes made in the past, using NLP methods to search for matches rather than just prevalence of keywords and synonyms).
Regarding claim 43 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Although Champlin-Scharf et al. discloses predicting performance bottlenecks based on similar software problem descriptions historical data (Paragraph 0057) and analyzing comments in source code (Paragraph 0032), Champlin-Scharf et al. does not specifically disclose wherein the system tracks other features that are important for predicting a collaboration bottleneck (e.g. an epic and a plurality of replies to the epic).
	However, Choetkiertikul discloses wherein the collaboration bottleneck comprises at least one of areas of collaboration for the plurality of teams of the user project to improve, or hiring gaps for management to bridge (Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR); It can be noted that the claim language is written in alternative form. The limitation taught by Choetkiertikul is based on “wherein the collaboration bottleneck comprises at least one of areas of collaboration for the plurality of teams of the user project to improve." Examiner notes that the one of the areas of the user project to improve is the velocity).
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project and the display of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. epics and a plurality of replies to the epics) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Regarding claim 44 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein the assessment model comprises a function between an input and the at least one predicted metric of the user project (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code). Although Champlin-Scharf et al. discloses all the limitations above and extracting comments in source code (paragraph 0032), Champlin-Scharf et al. does not specifically disclose wherein an input of the assessment model includes at least one of: a name of a team member that opened an issue; a number of comments in an epic; a number of team members contributing to the epic; a percentage of open tickets; a percentage of closed tickets; a number of tickets closed per period of time; a capacity of team members contributing to the epic; a number of data sources an epic is logged in; a length of a description in the epic; an expected time for epic completion; or an average time of epic completion.
However, Choetkiertikul discloses wherein the assessment model comprises a function between an input and the at least one predicted metric of the user project (Page 554, Fig. 3. An overview of our approach, see trained classifiers and predicted velocity; Page 554, 3.0 Overview of our approach, Our approach consists of two phases: the learning phase and the execution phase (see Fig. 3). The learning phase involves using historical iterations to build a predictive model (using machine learning techniques), which is then used to predict outcomes, i.e., velocity(Difference), of new and ongoing iterations in the execution phase. To apply machine learning techniques, we need to engineer features for the iteration. An iteration has a number of attributes (e.g., its duration, the participants, etc.) and a set of issues whose dependencies are described as a dependency graph. Each issue has its own attributes and derived features (e.g., from its textual description). Our approach separates the iteration-level features into three components: (i) iteration attributes, (ii) complexity descriptors of the dependency graph (e.g., the number of nodes, edges, fan-in, fan-out, etc.), and (iii) aggregated features from the set of issues that belong to the iteration. A more sophisticated approach would involve embedding all available information into an euclidean space, but we leave this for future work);
and an input of the assessment model includes at least one of: a name of a team member that opened an issue; a number of comments in an epic; a number of team members contributing to the epic; a percentage of open tickets; a percentage of closed tickets; a number of tickets closed per period of time; a capacity of team members contributing to the epic; a number of data sources an epic is logged in; a length of a description in the epic; an expected time for epic completion; or an average time of epic completion (Fig. 4. An example of an on-going iteration report; Pages 555-556, 4.2 Features of an issue, Fig. 7 shows an example of an issue report of issue AURORA-716 in the Apache project which the details of an issue are provided such as type, priority, description, and comments including a story points and an assigned iteration; Page 557, 4.2.1. Primitive Analysis of an Issue, Previous studies (e.g., [16]) have found that the number of comments on an issue indicates the degree of team collaboration, and thus may affect its resolving time; Page 561, 6.2 Experimental Setting, As discussed in Section 2, we would like to predict the difference between the actual delivered velocity against the target velocity. For example, if the output of our model is -5, it predicts that the team will deliver 5 story points below the target. Table 7 shows the statistical descriptions of the difference between the actual delivered against the target velocity of the five projects in terms of the minimum, maximum, mean, median, mode, standard deviations (SD), and interquartile range (IQR); It can be noted that the claim language is written in alternative form. The limitation taught by Choetkiertikul is based on “a number of comments in an epic.").
It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate collaboration features in the analysis (e.g. number of comments) of the invention of Choetkiertikul because doing so would allow the system to predict outcomes of new and ongoing iterations in the execution phase (see Choetkiertikul, Page 554, 3 Overview of our approach), wherein team collaboration is an important factor in the prediction model (see Choetkiertikul, Page 567, 6.4.6 Important Features).  Also, It would have been obvious to one ordinary skill in the art at the time the invention was filed to modify the use of machine learning to generate an assessment outcome reflecting the at least one predicted metric of the user project of the invention of Champlin-Scharf et al. to further incorporate the use of a neural network of the invention of Choetkiertikul because the neural network approach is a subclass of the machine learning and is a very well known technique for predicting project outcomes. 
Regarding claim 45 (New), which is dependent of claim 1, the combination of Champlin-Scharf et al., Choetkiertikul, and Liu et al. discloses all the limitations in claim 1. Champlin-Scharf et al. further discloses wherein applying the assessment model to the first set of metadata to generate the assessment outcome comprises applying an input of the assessment model into the function to generate the assessment outcome (Paragraph 0057, Machine Learning models can be employed to predict a number of things including, but not limited to, defects, performance bottlenecks, and most importantly. These identified defects and bottlenecks may then be mapped to specific tests that would be most effective in testing these aspects of the new code).



Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Saha et al. (US 2019/0265970) – discloses analyzing textual or structural information of a subject project; determining one or more candidate target projects based on a similarity score; and predicting defects in the subject project (see at least Figure 3).
Burns et al. (US 2018/0349135 A1) – discloses predicting software project success (see at least Paragraph 0070).
Srivastava et al. (US 2018/0260760 A1) – discloses recommending resolution for resolving an issue associate with the project (see at least Paragraphs 0003 & 0017).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARJORIE PUJOLS-CRUZ whose telephone number is (571)272-4668. The examiner can normally be reached Mon-Thru 7:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Patricia H Munson can be reached on (571)270-5396. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/M.P./           Examiner, Art Unit 3624                                                                                                                                                                                                                                                                                                                                                                                              /PATRICIA H MUNSON/            Supervisory Patent Examiner, Art Unit 3624