Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is responsive to the Application filed in the U.S. on 6/10/2020. Claims 1-20 are pending in the case. Claims 1, 8, and 15 are written in independent form.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 5, 8, 9, 12, 15, 16, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Dodel et al. (U.S. Pre-Grant Publication No. 2021/0157845, hereinafter referred to as Dodel), and further in view of Non-Patent Literature Sakata et al., "FAQ Retrieval using Query-Question Similarity and BERT-Based Query-Answer Relevance", May, 24, 2019,  arXiv:1905.02851v2, hereinafter referred to as Sakata.

Regarding Claim 1:
Dodel teaches a system comprising a processor to:
receive a query;
Dodel teaches receiving a search query (Para. [0052]).
retrieve ranked candidates from an index based on the query;
Dodel teaches “the query is fired against the FAQ questions index 107C at circle 7 and the top set matching questions are sent to the FAQ model 212C from the text/document storage 109” where “the FAQ model212C re-ranks the top set of questions and returns the most relevant questions along with their answers 308” and “the FAQ model 212 is a BERT-based model” (Para. [0056]). Therefore, Dodel teaches retrieving ranked candidate questions and their corresponding answers from an FAQ questions index
re-rank the ranked candidates using a Bidirectional Encoder Representations from Transformers (BERT) query-question (Q-q) model
Dodel teaches “the query is fired against the FAQ questions index 107C at circle 7 and the top set matching questions are sent to the FAQ model 212C from the text/document storage 109” where “the FAQ model212C re-ranks the top set of questions and returns the most relevant questions along with their answers 308” and “the FAQ model 212 is a BERT-based model” (Para. [0056]). Therefore, Dodel teaches re-ranking the candidate matching questions that have corresponding answers.
the BERT Q-q model trained to match queries to questions of a frequently asked question (FAQ) dataset,
Dodel teaches “the query is fired against the FAQ questions index 107C at circle 7 and the top set matching questions are sent to the FAQ model 212C from the text/document storage 109” where “the FAQ model212C re-ranks the top set of questions and returns the most relevant questions along with their answers 308” and “the FAQ model 212 is a BERT-based model” (Para. [0056]) thereby teaching using the BERT-based model to match the query to questions of a frequently asked question dataset.
return the re-ranked candidates in response to the query.
Dodel teaches “the FAQ model 212C re-ranks the top set of questions and returns the most relevant questions along with their answers 308” (Para. [0056]).

Dodel explicitly teaches all of the limitations as stated above except:
wherein the BERT Q-q model is fine-tuned using paraphrases generated for the questions in the FAQ dataset; and

However, in the related field of endeavor of FAQ retrieval, Sakata, when combined with Dodel, teaches:
wherein the BERT Q-q model is fine-tuned using paraphrases generated for the questions in the FAQ dataset; and
Dodel teaches “one example language ML model is a transformer model (e.g., a GPT-2 model) that is first trained on (e.g., a very large amount of) data in an unsupervised manner using language modeling as a training signal, and is second fine-tuned on much smaller supervised datasets (e.g., known questions and their corresponding known answers) to help it solve specific tasks” (Para. [0117]).
Sakata teaches an FAQ where “each Q has paraphrase queries” (Page 2 Section 3)Therefore, Dodel in combination with Sakata teaches fine tuning using paraphrases of questions in the FAQ.

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Sakata and Dodel at the time that the claimed invention was effectively filed, to have combined the use of paraphrase queries for each Q, as taught by Sakata, with the system and method for processing a document search query to return top ranked passages and the the proper subset of documents, as taught by Dodel.
One would have been motivated to make such combination because Sakata teaches using paraphrases of questions in the FAQ (Page 2 Section 3) and it would have been obvious to a person having ordinary skill in the art that using paraphrases for questions would expand the ability for a vague query to be matched to a relevant question.

Regarding Claim 5:
Dodel and Sakata further teach:
wherein the processor is to perform a final re-ranking of the candidates by combining a plurality of re-rankers using an unsupervised late-fusion,
Dodel teaches combining a plurality of re-rankings by teaching “one or more of the top ranked passage, the top ranked documents, and the top ranked FAQ are returned at 423” where “in some embodiments, what is returned is subject to an access control list and/or confidence score threshold” (Para. [0065]).
wherein the plurality of re-rankers comprise the BERT Q-q model (Fig 3 - 212C), a BERT query-answer (Q-a) model (Fig 3 – 212B), and a passage-based re-ranker (Fig 3 – 212A).

Regarding Claim 8:
All of the limitations herein are similar to a subset of the limitations of Claim 1.

Regarding Claim 9:
All of the limitations herein are similar to a subset of the limitations of Claim 1.

Regarding Claim 12:
Dodel and Sakata further teach:
wherein generating the question paraphrases comprises using only the FAQ dataset as training input.
Sakata teaches “in each validation, all queries were split into training (60%), development (20%) and test (20%)” and conducting the experiments and evaluation “on two datasets, localgovFAQ and StackExchange” where “each Q has paraphrase queries” (Page 2 Section 3).  Therefore, Sakata teaches generating the question paraphrases comprises using only the FAQ dataset as training input when experimenting on the localgovFAQ dataset.

Regarding Claim 15:
All of the limitations herein are similar to a subset of the limitations of Claim 1.

Dodel and Sakata further teach:
a computer program product for ranking query candidates, the computer program product comprising a computer-readable storage medium having program code embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program code executable by a processor (Dodel – Para. [0057]).

Regarding Claim 16:
All of the limitations herein are similar to a subset of the limitations of Claim 1.

Regarding Claim 19:
All of the limitations herein are similar to a subset of the limitations of Claim 12.


Claims 2, 3, 10, 11, 13, 14, 17, 18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Dodel and Sakata, and further in view of Zhou et al. (U.S. Pre-Grant Publication No. 2008/0040339, hereinafter referred to as Zhou).

Regarding Claim 2:
Dodel and Sakata explicitly teaches all of the limitations as stated above except:
wherein the paraphrases are generated based on the FAQ dataset via a generative pretrained transformer trained on question-answer pairs of the FAQ dataset and fine-tuned using randomly sampled sequences.

However, in the related field of endeavor of Question-answer systems, Zhou teaches:
wherein the paraphrases are generated based on the FAQ dataset via a generative pretrained transformer trained on question-answer pairs of the FAQ dataset and fine-tuned using randomly sampled sequences.
Zhou teaches a question paraphrase generation system “provides as an output sets of associated question paraphrases 106 having essentially the same meaning” based on an input set of questions (Para. [0011]).  Zhou further teaches “classifier 310 can be a Support Vector Machine (SVM) classifier that is trained for each set using the words as features” (Para. [0020]).

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Zhou, Sakata, and Dodel at the time that the claimed invention was effectively filed, to have combined the question paraphrase generation system, as taught by Zhou, with the use of paraphrase queries for each Q, as taught by Sakata, and the system and method for processing a document search query to return top ranked passages and the proper subset of documents, as taught by Dodel.
One would have been motivated to make such combination because while Sakata teaches using paraphrases for queries (Page 2 Section 3), it is silent on how the paraphrases are generated and Zhou fills this gap by teaching the generation of paraphrases and clustering the paraphrases that answer the same question in further depth (Para. [0011])

Regarding Claim 3:
Dodel, Sakata, and Zhou further teach:
wherein the paraphrases are filtered to match the same FAQ as their generation questions using the index.
Zhou teaches “question paraphrases are questions in different formats that actually mean the same thing, and thus, have the same answer” (Para. [0003]) where a question paraphrase generation system “provides as an output sets of associated question paraphrases 106 having essentially the same meaning” (Para. [0011]).

Regarding Claim 10:
All of the limitations herein are similar to some or all of the limitations of Claim 2.

Regarding Claim 11:
Dodel, Sakata, and Zhou further teach:
wherein generating the question paraphrases comprises fine-tuning the generative pretrained transformer using randomly sampled sequences of a concatenated FAQ dataset with special tokens.
Dodel teaches “one example language ML model is a transformer model (e.g., a GPT-2 model) that is first trained on (e.g., a very large amount of) data in an unsupervised manner using language modeling as a training signal, and is second fine-tuned on much smaller supervised datasets (e.g., known questions and their corresponding known answers) to help it solve specific tasks” (Para. [0117]). Dodel further teaches questions comprising tokens by teaching “the language ML model is also trained to detect an end of question (EOQ) token” (Para. [0121]). Zhou teaches “question paraphrases are questions in different formats that actually mean the same thing, and thus, have the same answer” (Para. [0003]) where a question paraphrase generation system “provides as an output sets of associated question paraphrases 106 having essentially the same meaning” (Para. [0011]).

Regarding Claim 13:
Dodel, Sakata, and Zhou further teach:
wherein filtering the question paraphrases comprises selecting question paraphrases that match a question-answer pair of the FAQ dataset comprising a question that was used to generate the selected question paraphrases.
Zhou teaches “question paraphrases are questions in different formats that actually mean the same thing, and thus, have the same answer” (Para. [0003]) where a question paraphrase generation system “provides as an output sets of associated question paraphrases 106 having essentially the same meaning” (Para. [0011]).

Regarding Claim 14:
Dodel, Sakata, and Zhou further teach:
wherein the question paraphrases comprise title paraphrases, the question-answer pairs comprise title-abstract pairs, and wherein a BERT Q-t model is fine-tuned based on filtered title paraphrases.
Zhou teaches “the question type is an important attribute of a question, which usually indicates the category of its answer” and “using a widely accepted question type taxonomy [including]…. 1: abbreviation, explanation 2: animal, body, color, creative, currency, disease, event, food, instrument, language, letter, other, plant, product, religion, sport, substance, symbol, technique, term, vehicle, word 3: definition, description, manner, reason 4: group, individual, title, human-description 5: city, country, mountain, other, state 6: code, count, date, distance, money, order, other, period, percent, speed, temperature, size, weight” (Para. [0019]). Therefore, question paraphrases can be title paraphrases and question-answer pairs comprise titles or descriptions related to the answers.

Regarding Claim 17:
All of the limitations herein are similar to some or all of the limitations of Claim 2.

Regarding Claim 18:
All of the limitations herein are similar to a subset of the limitations of Claim 11.

Regarding Claim 20:
All of the limitations herein are similar to a subset of the limitations of Claim 14.


Claim 4 is rejected under 35 U.S.C. 103 as being unpatentable over Dodel and Sakata, and further in view of Zhang et al. (U.S. Pre-Grant Publication No. 2021/0049213, hereinafter referred to as Zhang).

Regarding Claim 4:
Dodel and Sakata explicitly teaches all of the limitations as stated above except:
wherein the Q-q BERT model is trained using triplets comprising a question, a positive paraphrase, and a negative paraphrase.

However, in the related field of endeavor of question answering systems, Zhang teaches:
wherein the Q-q BERT model is trained using triplets comprising a question, a positive paraphrase, and a negative paraphrase.
Zhang teaches “to train the model, the exemplary methods first apply a ranking triplet loss function to learn the relative rank between positive samples…and negative samples” (Para. [0051]).

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Zhang, Sakata, and Dodel at the time that the claimed invention was effectively filed, to have combined the triplet loss function for training, as taught by Zhang, with the use of paraphrase queries for each Q, as taught by Sakata, and the system and method for processing a document search query to return top ranked passages and the proper subset of documents, as taught by Dodel.
One would have been motivated to make such combination because Zhang teaches training a model using both positive and negative samples (Para. [0051]), and it would have been obvious to a person having ordinary skill in the art that being able to train on positive and negative samples would create a more informed and robust model that not only would learn positive associations, but also negative associations.

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Dodel and Sakata, and further in view of Kirshenbaum (U.S. Pre-Grant Publication No. 2005/0097067).

Regarding Claim 6:
Dodel and Sakata explicitly teaches all of the limitations as stated above except:
wherein the unsupervised late-fusion comprises summing candidate scores assigned for each candidate by the BERT Q-q model, a BERT query-answer (Q-a) model, and a passage-based re-ranker.

However, in the related field of endeavor of assessing multiple evaluators for choices, Kirshenbaum teaches:
wherein the unsupervised late-fusion comprises summing candidate scores assigned for each candidate by the BERT Q-q model, a BERT query-answer (Q-a) model, and a passage-based re-ranker.
Kirshenbaum teaches “the combining operation may be performed by simply summing the evaluators’ scores for each choice” (Para. [0051]).

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Kirshenbaum, Sakata, and Dodel at the time that the claimed invention was effectively filed, to have combined the method for merging scores of the same choice from multiple evaluators, as taught by Kirshenbaum, with the use of paraphrase queries for each Q, as taught by Sakata, and the system and method for processing a document search query to return top ranked passages and the the proper subset of documents, as taught by Dodel.
One would have been motivated to make such combination because Kirshenbaum teaches “a decision making method that combines the opinions of different classification routines may provide significantly improved accuracy and a corresponding improvement in the ability to cater to a customer’s needs” (Para. [0019]).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Dodel and Sakata, and further in view of Non-Patent Literature Xiong, Chenjan, "Knowledge Based Text Representations for Information Retrieval", May 2016, Language Technologies Institute School of Computer Science Carnegie Mellon University, hereinafter referred to as Xiong


Regarding Claim 7:
Dodel and Sakata explicitly teaches all of the limitations as stated above except:
wherein the unsupervised late-fusion comprises applying an unsupervised query expansion step for re-ranking a candidate pool of the summed candidate scores.

However, in the related field of endeavor of knowledge based text representations for information retrieval, Xiong teaches:
wherein the unsupervised late-fusion comprises applying an unsupervised query expansion step for re-ranking a candidate pool of the summed candidate scores.
Xiong teaches “we use the selected expansion terms and their scores to re-rank the retrieved documents with the RM Model” (Page 19 Section 3.1.2.3).
Xiong further teaches “queries are usually short and not written carefully, which makes it more difficult to understand the intent behind a query and retrieve relevant documents” where “a common solution is query expansion, which uses a larger set of related terms to represent the user’s intent and improve the
document ranking” and an example of query expansion techniques including “RM model selects expansion terms based on their term frequency in top retrieved documents, and weights them by
documents’ ranking scores” (Page 14 Section 3.1.1)

Thus it would have been obvious to a person having ordinary skill in the art, having the teachings of Xiong, Sakata, and Dodel at the time that the claimed invention was effectively filed, to have combined the query expansion to re-rank retrieved documents, as taught by Xiong, with the use of paraphrase queries for each Q, as taught by Sakata, and the system and method for processing a document search query to return top ranked passages and the the proper subset of documents, as taught by Dodel.
One would have been motivated to make such combination because Xiong teaches sometimes, queries are “short and not written carefully, which makes it more difficult to understand the intent behind a query and retrieve relevant documents” and proposes a solution to mitigate the problem of a poorly written query by applying query expansion to the top retrieved documents (Page 14 Section 3.1.1)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Katzman et al. (U.S. Pre-Grant Publication No. 2021/0157861) teaches receiving an ingestion request to ingest a document; extracting text from the document; pre-processing the extracted text to generate pre-processed text that is predictable and analyzable; generating an index entry for the extracted text, the index entry to map the extracted text to a reserved field of a plurality of reserved fields; and storing the extracted text, index entry, and pre-processed text in at least one data storage location.
Tiwari et al. (U.S. Pre-Grant Publication No. 2021/0133264) teaches example data processing systems and methods are described. In one implementation, a system accesses a corpus of data and analyzes the data contained in the corpus of data to identify multiple documents. The system generates vector indexes for the multiple documents such that the vector indexes allow a computing system to quickly access the plurality of documents and identify an answer to a question associated with the corpus of data.
Non-Patent Literature Padaki et al. "Rethinking Query Expansion for BERT Reranking", April 2020, Advances in Information Retrieval: 42nd European Conference on IR Research, ECIR 2020 teaches recent studies have shown promising results of using BERT for Information Retrieval with its advantages in understanding the text content of documents and queries. Compared to short, keywords queries, higher accuracy of BERT were observed on long, natural language queries, demonstrating BERT’s ability in extracting rich information from complex queries. These results show the potential of using query expansion to generate better queries for BERT-based rankers. In this work, we explore BERT’s sensitivity to the addition of structure and concepts. We find that traditional word-based query expansion is not entirely applicable, and provide insight into methods that produce better experimental results.
Non-Patent Literature Liu et al., "Multi-Task Deep Neural Networks for Natural Language Understanding", May 30, 2019, arXiv:1901.11504v2 teaches a Multi-Task Deep Neural Network (MT-DNN) for learning representations across multiple natural language understanding (NLU) tasks. MT-DNN not only leverages large amounts of cross-task data, but also benefits from a regularization effect that leads to more general representations to help adapt to new tasks and domains.




Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT F MAY whose telephone number is (571)272-3195. The examiner can normally be reached Monday-Friday 9:30am to 6pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Hosain Alam can be reached on 571-272-3978. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT F MAY/Examiner, Art Unit 2154                                                                                                                                                                                                        6/16/2022

/HOSAIN T ALAM/Supervisory Patent Examiner, Art Unit 2154