DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 03/11/2021 have been fully considered but they are not persuasive.
Applicant Remarks:
Independent claims 8 and 15 have been similarly amended. Specifically, in amended independent claim 1, which is representative of amended independent claims 8 and 15, the WH-reliability score is based on the user; it is proportional to how many of the user's watched hypotheses have passed first-pass automatic vetting. In contrast, the confidence score in Byron is based on the "degree of matching" between an input question and a candidate answer. Byron, at paragraph [0071]. Byron is silent regarding a confidence score based on the user.
Examiner Response:
	The examiner respectfully disagrees, Byron in paragraph [0035] “Content users input questions to the QA system which then answers the input questions using the content in the corpus of data by evaluating documents, sections of documents, portions of data in the corpus, or the like.” Teaches content users inputting questions to a QA system therefore, anytime an input question is received a confidence score is provided regarding the evidence that the potential response, i.e. candidate answer, is inferred by the question. Therefore, it is noted that a score is provided to the users in response to their input. The examiner suggests amending the claims 
Applicant Remarks:
	As discussed above in Claim Rejections - 35 U.S.C. § 102, independent claims 1, 8, and 15 have been amended to include a limitation not taught or suggested by Byron. As a result of their dependency on independent claims 1, 8, and 15, claims 3, 6, 7, 10, 13, 14, 17, 19, and 20 also include the same limitation not taught or suggested by Byron. Specifically, in amended independent claim 1, which is representative of amended independent claims 8 and 15, the WH-reliability score is based on the user; it is proportional to how many of the user's watched hypotheses have passed first-pass automatic vetting. In contrast, the confidence score in Byron is based on the "degree of matching" between an input question and a candidate answer. Byron, at paragraph [0071]. Applicant submits that this deficiency is not cured by Zhang, Murdock, alone or in combination. 
Further, with respect to claims 3, 10, and 17, Applicant respectfully disagrees that Zhang teaches "if the to-be-vetted QA pair is comparable with the one or more existing QA pairs, declare the to-be-vetted QA pair as a duplicate pair; and discard the duplicate pair." The Office states that "[d]iscard the duplicate pair is taught as pruning or filtering the duplicate questions." Office Action, at page 18. Zhang does not teach pruning or filtering duplicate questions. Rather, Zhang teaches pruning or filtering the space in which the search for duplicates is performed. Zhang, at page 1223. To quote Zhang, "we prune existing questions . . ., thus narrowing the search space for candidate duplicate questions." Id That space is narrowed more by "additionally filter[ing] out questions that have no answers." Id. Having pruned and filtered the space in which the search for duplicates is performed, that space, now smaller, is searched for duplicates. See Id. 
Byron does not explicitly disclose if the to-be-vetted QA pair is comparable with the one or more existing QA pairs, declare the to-be-vetted QA pair as a duplicate pair; and discard the 
duplicate pair.

Examiner Response:
Applicant’s arguments with respect to limitations above have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
The newly cited Balen reference has been added to further teach the duplicate detection and removal. Refer to office action below for further details.

Claim Objections
Claim 3, 10 and 17 objected to because of the following informalities:  "determine" should be "determination".  Appropriate correction is required.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 03/11/2021 was filed after the mailing date of the Non-Final Rejection on 12/30/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 1, 2, 4-9, 11-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Byron (U.S.20160180242) in view of Murdock (WO2012047541).

Regarding claim 8, Byron teaches a computer program product for harvesting training data for a training set for use by a system capable of answering questions (Byron: Paragraph [0004] “the QA system and processing, by the QA system, the training question to generate an answer to the training question, from a portion of content in a corpus of information. …generate at least one additional training question and at least one additional entry in a ground truth data structure to thereby expand a set of training questions and expand the ground truth data structure. Moreover, in some illustrative embodiments, the method comprises training the QA system using the expanded set of training questions and expanded ground truth data structure.” A computer program product for harvesting training data for a training set for use by a system capable of answering questions is taught as the computer program product to generate additional training questions and answers for use by the QA system to answer questions.), the computer program product comprising non-transitory a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor to (Byron: Paragraph [0005] “a computer program product comprising a computer useable or readable medium having a computer readable program is provided. The computer readable program, when executed on a computing device, causes the computing device to perform” The computer program product comprising a computer readable storage medium having program instructions embodied therewith, the program instructions executable by a processor to cause the processor is taught as a computer program product comprising a computer useable or readable medium having a computer readable program is provided, when executed on a computing device, causes the computing device to perform.): 

receive, from a user, an input question (Byron: Paragraph [0036] “the QA system receives an input question, parses the question to extract the major features of the question,” Receive from a user an input question is taught as the QA system receives an input question.); 

(Byron: Paragraph [0069] “the question decomposition stage 330 to decompose the question into one or more queries that are applied to the corpora of data/information 345 in order to generate one or more hypotheses.” Process the input question and returning, to the user is taught as the question decomposition stage 330 to decompose the question into one or more queries.), a result set comprising one or more ranked hypotheses (Byron: Paragraph [0077] “The hypotheses/candidate answers are ranked according to these comparisons to generate a ranked listing of hypotheses/candidate answers (hereafter simply referred to as “candidate answers”).” A result set comprising one or more ranked hypotheses is taught as generate a ranked listing of hypotheses/candidate answers.) and one or more ranked evidence passages corresponding to the one or more ranked hypotheses (Byron: Paragraph [0003] “evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question” One or more ranked evidence passages corresponding to the one or more ranked hypotheses is taught as evidence scoring based on a retrieval of evidence from evidence sources, performs synthesis of the one or more hypothesis, and based on trained models, performs a final merging and ranking to output an answer to the input question.); 

receive, from the user, an indication that one of the one or more ranked hypotheses is to be designated a watched hypothesis (Byron: Paragraph [0079] “the candidate answers and final answer may further be output to the ground truth generation engine 390 that presents the candidate answers to a Subject Matter Expert (SME) who then verifies, through user input, which candidate answer(s) is/are correct candidate answers for the input question” Paragraph [0077] “The hypotheses/candidate answers are ranked according to these comparisons to generate a ranked listing of hypotheses/candidate answers” Receive, from the user, an indication that one of the one or more ranked hypotheses is to be designated a watched hypothesis is taught as the candidate answers and final answer may further be output to the ground truth generation engine that presents the candidate answers to a Subject Matter Expert (SME) who then verifies, through user input [i.e. “indication that one of the one more ranked hypotheses is to designated a watched hypothesis”], which candidate answer(s).); 

add the input question and the watched hypothesis to a to-be-vetted question/answer (QA) pair set comprising one or more to-be-vetted QA pairs (Byron: Paragraph [0078] “a set of training questions are submitted to the QA system pipeline 300 and are processed by the QA system pipeline 300 to generate a set of candidate answers and/or final answer with corresponding confidence measure values.” Add the input question and the watched hypothesis to a to-be-vetted question/answer (QA) pair set comprising one or more to-be-vetted QA pairs is taught as a set of training questions are submitted to the QA system pipeline and are processed by the QA system pipeline to generate a set of candidate answers and/or final answer with corresponding confidence measure values.); 

vetting each of the one or more to-be-vetted QA pairs in the to-be-vetted QA pair set through a first-pass automatic vetting procedure (Byron: Paragraph [0078] “a set of training questions are submitted to the QA system pipeline 300 and are processed by the QA system pipeline 300 to generate a set of candidate answers and/or final answer with corresponding confidence measure values. The candidate answers and/or final answer are compared to a ground truth data structure to identify whether the candidate answers and/or final answer match corresponding entries in the ground truth data structure that correspond to the training question being processed. Based on the comparison, a trained model is generated that includes the weights to be applied to various annotators implemented in the QA system pipeline 300 to thereby adjust their operation or the values generated by the annotators for evaluating the candidate answers and generating their corresponding confidence measure values.” Vet each of the one or more to-be-vetted QA pairs in the to-be-vetted QA pair set through a first-pass automatic vetting procedure is taught taking the set of training questions submitted with the generated candidate answers[i.e. hypothesis] pair and comparing them to ground truth data [previously stored training data or qa pairs]. The vetting process is taught by using the trained model to apply weights and confidence measure values to the qa candidate pairs.), wherein the first-pass automatic vetting procedure comprises: 

assigning a threshold value for determining whether the one or more to-be-vetted QA pair passes the first-pass automatic vetting procedure (Byron: Paragraph [0077] “The resulting confidence scores or measures are processed by a final confidence merging and ranking
 stage 370 which compares the confidence scores and measures to each other, compares them against predetermined thresholds, or performs any other analysis on the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Assigning a threshold value for determining whether the one or more to-be-vetted QA pair passes the first-pass automatic vetting procedure is taught as comparing the confidence scores against predetermined thresholds to determine which hypothesis/candidate answers are most likely to be correct.), 

assigning to the user a WH-reliability score proportional to a percentage of QA pairs obtained from the user's previous indications of watched hypotheses (Byron: Paragraph [0077] “The resulting confidence scores or measures are processed by a final confidence merging and ranking stage 370 which compares the confidence scores and measures to each other, compares them against predetermined thresholds, or performs any other analysis on the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Assign the user a WH-reliability score proportional to a percentage of QA pairs is taught as the confidence scores to determine which hypotheses are the most likely to be correct. The confidence scores [i.e. WH-reliability score] are in comparisons to the other percentage of QA pair answers [i.e. a percentage of QA pairs harvested from the user's previous indications of watched hypotheses]. The examiner notes that a WH- reliability score is taught as a confidence score.) that have successfully passed the first-pass automatic vetting (Byron: Paragraph [0078] “a set of training questions are submitted to the QA system pipeline 300 and are processed by the QA system pipeline 300 to generate a set of candidate answers and/or final answer with corresponding confidence measure values. The candidate answers and/or final answer are compared to a ground truth data structure to identify whether the candidate answers and/or final answer match corresponding entries in the ground truth data structure that correspond to the training question being processed. Based on the comparison, a trained model is generated that includes the weights to be applied to various annotators implemented in the QA system pipeline 300 to thereby adjust their operation or the values generated by the annotators for evaluating the candidate answers and generating their corresponding confidence measure values.” Harvested from the user's previous indications of watched hypotheses that have successfully cleared first-pass automatic vetting is taught taking the set of training questions submitted with the generated candidate answers[i.e. hypothesis] pair and comparing them to ground truth data [previously stored training data or qa pairs]. The vetting process is taught by using the trained model to apply weights and confidence measure values to the QA candidate pairs.), … based on at least the WH-reliability score (Byron: Paragraph [0077] “…the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Assign the user a WH-reliability score proportional to a percentage of QA pairs is taught as the confidence scores to determine which hypotheses are the most likely to be correct. The confidence scores [i.e. WH-reliability score] are in comparisons to the other percentage of QA pair answers [i.e. a percentage of QA pairs harvested from the user's previous indications of watched hypotheses]. The examiner notes that a WH- reliability score is taught as a confidence score.)…adding the vetted QA pair to the training set (Byron: Paragraph [0089] “the identified sibling question/answer pairs that follow the same repeatable pattern may be automatically or semi-automatically added to the ground truth table data structure 398 and used to expand the set of training questions 399 for training the QA system pipeline 300.” If a vetted QA pair passes the first-pass automatic vetting procedure, add the vetted QA pair to the training set is taught as the QA pairs that follow the same repeatable pattern may be automatically added to the ground truth table data structure and used to expand the set of training question/answers in the QA pipeline.)…

and retrain one or more ranking models based upon the training set (Byron: Paragraph [0089] “Thus, additional question/answer pairs are generated and added to the ground truth table data structure 398 in an automated or semi-automated manner rather than having to have a SME manually enter, or identify within the corpus, all of the question/answer pairs that may be used for training the QA system pipeline 300. Thus, the ground truth table data structure 398 is automatically expanded with additional question/answer pairs which also expands the training question set 399 that may be utilized to train the QA system.” Retrain one or more ranking models based upon the training set is taught as expands the training question set that may be utilized to train the QA system [i.e. retraining the models]. Based on the expanded training set training the QA system [i.e. “one or more ranking models” Paragraph [0037] “The statistical model… generates a final answer, or ranked set of answers”]). 
Byron does not explicitly disclose and producing a final weighing of the to-be-vetted QA pair …, wherein the to-be-vetted QA pair becomes a vetted QA pair following the producing of the final weighing of the to-be-vetted QA pair; and   determining that the final weighing meets or exceeds the threshold value;… based on the determining that the final weighing meets or exceeds the threshold value;
Murdock further teaches and producing a final weighing of the to-be-vetted QA pair …, wherein the to-be-vetted QA pair becomes a vetted QA pair following the producing of the final weighing of the to-be-vetted QA pair (Murdock: Paragraph [0105] “This component selects, for a given question, one of a plurality of models 582 to which to send the candidate answers for final scoring. For a given question, all the candidate answers are scored by that one model. ” The final weighting of the to-be-vetted QA pair is taught as for the given question and the candidate answers that are scored by the final scoring model. [0056] “it implements methods that weight the scores of candidate answers based on the trained model.” The prior art teaches a method of scoring a question-candidate pair based on two scores which are used to determine the final scoring and weighting. Paragraph [0012] “determine a candidate answer classification score for the candidate answer based on the plurality of scores for that candidate answer.” The aggregated results are based on the classification score and the feature scores that are combined.); and   determining that the final weighing meets or exceeds the threshold value (Murdock: Paragraph [0058] “classification scores with the classification scores used as a measure of answer confidence, that is, possible candidate answers are compared and evaluated by applying the prediction function to the complete feature set or subset thereof. If the classification score is higher than a threshold, this answer is deemed as an acceptable answer.” Determining that the final weighing meets or exceeds the threshold value is taught as if the classification score is based on a measure of answer confidence is higher than a threshold it is deemed acceptable.);… based on the determining that the final weighing meets or exceeds the threshold value (Murdock: Paragraph [0058] “classification scores with the classification scores used as a measure of answer confidence, that is, possible candidate answers are compared and evaluated by applying the prediction function to the complete feature set or subset thereof. If the classification score is higher than a threshold, this answer is deemed as an acceptable answer.” Determining that the final weighing meets or exceeds the threshold value is taught as if the classification score is based on a measure of answer confidence is higher than a threshold it is deemed acceptable.);

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of expanding training questions of Byron with the scoring methods of Murdock in order to produce a plurality of scores for each of the candidate answers for a given question, thereby allowing the use of the scores to select the best answer and or rank the list of answers (Murdock: Paragraph [0019]  “For a given question, all answers are scored by that model, and those scores are used to select a best answer and/or rank the list of answers.”).
Claim 1 and 15 are similarly rejected, refer to claim 8 for further analysis.
Regarding claim 9, Byron in view of Murdock teaches the computer program product of claim 8, wherein the program instructions executable by the processor further cause the processor to: determining that the final weighing does not meet or exceed the threshold value (Byron: Paragraph [0077] “The resulting confidence scores or measures are processed by a final confidence merging and ranking stage 370 which compares the confidence scores and measures to each other, compares them against predetermined thresholds, or performs any other analysis on the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Determining that the final weighing does not meet or exceed the threshold value is taught as confidence scores or measures are processed by a final confidence merging and compares the confidence scores and measures to each other, compares them against predetermined thresholds. The examiner notes that the final weight not exceeding the threshold is taught as the hypotheses/candidates that do not exceed the predetermined thresholds.); and vetting the vetted QA pair through additional human vetting (Byron: Paragraph [0078] “The candidate answers and/or final answer are compared to a ground truth data structure to identify whether the candidate answers and/or final answer match corresponding entries in the ground truth data structure that correspond to the training question being processed.” If the question and candidate answers do not match corresponding entries in the ground truth data they are further verified by a SME [subject matter expert]. Paragraph [0079] “(SME) who then verifies, through user input, which candidate answer(s) is/are correct candidate answers for the input question, thereby generating one or more question/answer pairs.”) based on the determining that the final weighing does not meet or exceed the threshold value (Byron: Paragraph [0077] “The resulting confidence scores or measures are processed by a final confidence merging and ranking stage 370 which compares the confidence scores and measures to each other, compares them against predetermined thresholds, or performs any other analysis on the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Determining that the final weighing does not meet or exceed the threshold value is taught as confidence scores or measures are processed by a final confidence merging and compares the confidence scores and measures to each other, compares them against predetermined thresholds. The examiner notes that the final weight not exceeding the threshold is taught as the hypotheses/candidates that do not exceed the predetermined thresholds.). 
Claim 2 and 16 are similarly rejected, refer to claim 9 for further analysis.
Regarding claim 11, Byron in view of Murdock teaches the computer program product of claim 8, wherein the to-be-vetted QA pair is determined to be comparable with the one or more existing QA pairs (Byron: Paragraph [0046] “124. That is, as will be described in greater detail hereafter, given a seed training question and corresponding question/answer pair in the ground truth data structure 122, the pipeline training system 120 implements ground truth and training question generation logic that looks for repeatable patterns in the content of a corpus upon which the QA system pipeline 108 operates to thereby identify additional question/answer pairs that may be automatically or semi-automatically added to the ground truth data structure” The to-be-vetted QA pair is determined to be comparable with the one or more existing QA pairs is taught as identifying the repeatable patterns between the seed training question and answer and the corresponding question/answer pair in the ground truth data.) on the basis of at least one of string matches (Byron: Paragraph [0084] “The question string of the input question 310 may be searched for additional relation terms that are also present within the document context where the original marked answer appears. The structure of the document at the location of the original marked answer may be analyzed by the training question generation logic” On the basis of at least one of string matches is taught as the question string of the input question may be searched.), spelling variants, and known synonyms.

Claim 4 are similarly rejected, refer to claim 11 for further analysis.
Regarding claim 13, Byron in view of Murdock teaches the computer program product of claim 8, Byron further teaches wherein the program instructions executable by the processor further cause the processor to (Byron: Paragraph [0021] “a computer program product, the logic represented by computer code or instructions embodied in or on the computer program product is executed by one or more hardware devices in order to implement the functionality or perform the operations” The program instructions executable by the processor further cause the processor is taught as computer code or instructions embodied in or on the computer program product is executed by one or more hardware devices.):… the input question and the watched hypothesis (Byron: Paragraph [0078] “a set of training questions are submitted to the QA system pipeline 300 and are processed by the QA system pipeline 300 to generate a set of candidate answers and/or final answer with corresponding confidence measure values.” The input question and the watched hypothesis is taught as a set of training questions [i.e. input question] are submitted to the QA system pipeline and are processed by the QA system pipeline to generate a set of candidate answers [i.e. hypothesis]and/or final answer with corresponding confidence measure values.)

Byron does not explicitly disclose calculate a QA consistency score based upon one or more features …
Murdock further teaches calculate a QA consistency score based upon one or more features of the … (Murdock: Paragraph [0057] “each question-candidate pair comprises an instance, and scores are obtained from a wide range of features, e.g., co-occurrence of answer and query terms, whether a candidate matches answer type, and search engine rank. Thus, for an example question,” Calculate a QA consistency score based upon one or more features of the input question and the watched hypothesis is taught as the scores obtained between a wide variety of features based on the question-candidate pair [i.e. a QA consistency score].). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of expanding training questions of Byron with the scoring methods of Murdock in order to produce a plurality of scores for each of the candidate answers for a given question, thereby allowing the use of the scores to select the best answer and or rank the list of answers (Murdock: Paragraph [0019] “For a given question, all answers are scored by that model, and those scores are
used to select a best answer and/or rank the list of answers.”).

Claim 6 and 19 are similarly rejected, refer to claim 13 for further analysis.
Regarding claim 14 , Byron in view of Murdock teaches the computer program product of claim 8, wherein the program instructions executable by the processor further cause the processor to (Byron: Paragraph [0021] “a computer program product, the logic represented by computer code or instructions embodied in or on the computer program product is executed by one or more hardware devices in order to implement the functionality or perform the operations” The program instructions executable by the processor further cause the processor is taught as computer code or instructions embodied in or on the computer program product is executed by one or more hardware devices.): … the WH-reliability score (Byron: Paragraph [0077] “…the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Assign the user a WH-reliability score proportional to a percentage of QA pairs is taught as the confidence scores to determine which hypotheses are the most likely to be correct. The confidence scores [i.e. WH-reliability score] are in comparisons to the other percentage of QA pair answers [i.e. a percentage of QA pairs harvested from the user's previous indications of watched hypotheses]. The examiner notes that a WH- reliability score is taught as a confidence score.)…
Byron does not explicitly disclose …combine [confidence score]… and a QA consistency score to produce a final weighing of the to-be-vetted QA pair.
Murdock further teaches …combine [confidence score] (Murdock: Paragraph [0100] “normalize and merge candidate answers, merge feature scores produced by the same answer scorer across multiple instances of the candidate answer, and aggregate the results.” Combining the scores are taught as merging the scores to form an aggregate result.  [0058] “ the classification scores used as a measure of answer confidence”)… and the QA consistency score (Murdock: Paragraph [0057] “each question candidate pair comprises an instance, and scores are obtained from a wide range of features, e.g., co-occurrence of answer and query terms, whether a candidate matches answer type, and search engine rank. Thus, for an example question,” Calculate a QA consistency score based upon one or more features of the input question and the watched hypothesis is taught as the scores obtained between a wide variety of features based on the question-candidate pair [i.e. a QA consistency score].) to produce the final weighing of the to-be-vetted QA pair (Murdock: Paragraph [0105] “This component selects, for a given question, one of a plurality of models 582 to which to send the candidate answers for final scoring. For a given question, all the candidate answers are scored by that one model. ” The final weighting of the to-be-vetted QA pair is taught as for the given question and the candidate answers that are scored by the final scoring model. [0056] “it implements methods that weight the scores of candidate answers based on the trained model.” The prior art teaches a method of scoring a question-candidate pair based on two scores which are used to determine the final scoring and weighting. Paragraph [0012] “determine a candidate answer classification score for the candidate answer based on the plurality of scores for that candidate answer.” The aggregated results are based on the classification score and the feature scores that are combined.).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method of expanding training questions of Byron with the scoring methods of Murdock in order to produce a plurality of scores for each of the candidate answers for a given question, thereby allowing the use of the scores to select the best answer and or rank the list of answers (Murdock: Paragraph [0019]  “For a given question, all answers are scored by that model, and those scores are used to select a best answer and/or rank the list of answers.”).

Claim 7 and 20 are similarly rejected, refer to claim 14 for further analysis.

3, 10, 17 and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over  Byron (U.S.20160180242) in view of Murdock (WO2012047541) and Balen (Topic and duplicate detection in QA data).
Regarding claim 10, Byron in view of Murdock teaches the computer program product of claim 8, wherein the program instructions executable by the processor further cause the processor to: compare a parse structure (Byron: Paragraph [0045] “QA system receives an input question which it then parses to extract the major features of the question” Compare a parse structure is taught as parses to extract the major features of the question.) of the to-be-vetted QA pair with one or more existing QA pairs in the training set (Byron: Paragraph [0046] “124. That is, as will be described in greater detail hereafter, given a seed training question and corresponding question/answer pair in the ground truth data structure 122, the pipeline training system 120 implements ground truth and training question generation logic that looks for repeatable patterns in the content of a corpus upon which the QA system pipeline 108 operates to thereby identify additional question/answer pairs that may be automatically or semi-automatically added to the ground truth data structure” the to-be-vetted QA pair with one or more existing QA pairs in the training set is taught as identifying the repeatable patterns between the seed training question and answer and the corresponding question/answer pair in the ground truth data.);…
Byron does not explicitly disclose determine that the to-be-vetted QA pair is comparable with the one or more existing QA pairs; declare the to-be-vetted QA pair as a duplicate pair based on the determine that the to-be-vetted QA pair is comparable with the one or more existing QA pairs; and discard the duplicate pair.
(Balen: Section 5.3 “We iterate over the set of QA pairs (the first for loop) and for every QA pair which question has not been classified, i.e. does not have a qgroup, we give it a newly generated qgroup and then we iterate within every other pair which question hasn’t been classified and compare them with q” The to-be-vetted QA pair is comparable with the one or more existing QA pairs is taught as the comparison of the test set of QA pairs with the QA that has not been classified.); declare the to-be-vetted QA pair as a duplicate pair based on the determination that the to-be-vetted QA pair is comparable with the one or more existing QA pairs (Balen: Section 6.2 Duplicate Detection “For evaluation of the duplicate detection module we manually constructed a test set of 320 questions. Starting from 80 original questions we created three types of duplicates for each. The three duplicate classes chosen are near duplicate, similar and semantically similar” Declare the to-be-vetted QA pair as a duplicate pair based on the determination that the to-be-vetted QA pair is comparable with the one or more existing QA pairs is taught as the duplicate detection model for which the QA pairs are sorted into groups that identify them as a duplicate.); and discard the duplicate pair (Balen: Section 3 “The grouping module aims to remove duplicates and group similar QA pairs into clusters.” Discard the duplicate pair is taught as remove duplicate QA pairs by sorted into clusters.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Byron and Murdock with the duplication detection of Balen in order to implement a duplication detection method based on similarity measure, thereby improving the system by utilizing a detection combination which yields better results from the individually optimally tuned detection methods (Balen: Section 7 Conclusion “We also showed that an OR-based composition approach of all three detection methods with custom parameter tuning over the detector combination as a whole yielded better results from individually optimally tuned detection methods, because each method detects different subsets.”).

Claim 3 and 17 are similarly rejected, refer to claim 10 for further analysis.

Regarding claim 23, (New) Byron in view of Murdock and Balen teach the method as recited in claim 3, Byron further teaches wherein the assigning to the user a WH-reliability score (Byron: Paragraph [0077] “…the confidence scores to determine which hypotheses/candidate answers are the most likely to be the correct answer to the input question.” Assign the user a WH-reliability score proportional to a percentage of QA pairs is taught as the confidence scores to determine which hypotheses are the most likely to be correct. The confidence scores [i.e. WH-reliability score] are in comparisons to the other percentage of QA pair answers [i.e. a percentage of QA pairs harvested from the user's previous indications of watched hypotheses]. The examiner notes that a WH- reliability score is taught as a confidence score.) is …
Balen further teaches calculated as if the duplicate pair is not discarded (Balen: Section 5. “The goal is to mark duplicate questions and answers so that search interface can remove the duplicates, or at least group these together” Calculated as if the duplicate pair is not discarded is taught as the duplicates being grouped instead of removed.).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Byron and Murdock with the duplication detection of Balen in order to implement a duplication detection method based on similarity measure, thereby improving the system by utilizing a detection combination which yields better results from the individually optimally tuned detection methods (Balen: Section 7 Conclusion “We also showed that an OR-based composition approach of all three detection methods with custom parameter tuning over the detector combination as a whole yielded better results from individually optimally tuned detection methods, because each method detects different subsets.”).

Allowable Subject Matter
Claim 21 and 22 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AHSIF A. SHEIKH whose telephone number is (571)272-2607.  The examiner can normally be reached on Mon-Fri 7:30-5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alexey Shmatov can be reached on 571-270-3428.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/A.A.S./Examiner, Art Unit 2123

/ALEXEY SHMATOV/Supervisory Patent Examiner, Art Unit 2123