Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a mental process of determining probabilities that a token will be relevant to an answer without significantly more. 
The independent claims 1, 11, and 17 recite tokenizing a text, extracting features from a token sequence, determining a highest probability of a label sequence for the token sequence to assign a label to each token, and determining whether the highest probability label sequence indicates that a token is relevant to a question. If the token is relevant to a question, the token is provided as an answer. If no token is relevant to the question, a marginal probability is used to determine a forced answer. This appears to be a mental process because each of the claims steps are data receiving, data extraction, data evaluation, and data output steps. A human being equipped with a generic computer is capable of tokenizing data and following an algorithm to label data and, based on determined probabilities associated with the labels, deciding if a token is relevant to a question. 
This judicial exception is not integrated into a practical application because none of the claimed steps appear to improve the processing of a computer or require the use of a specific machine. While claims 11 and 17 each recite elements of hardware in the form of a processor, medium, and non-transitory media, the claimed hardware is recited as generic components. It is noted that the claimed invention is never actually used to respond to user questions. 
The claims not include additional elements that are sufficient to amount to significantly more than the judicial exception because the claimed invention includes no elements beyond the data retrieval, analysis, and output steps. No additional elements are claimed that improve the processing of a computer or require the use of a specific machine. 
Dependent claims 2-10, 12-16, and 18-20 similarly recite further data analysis and output steps that appear to be mental processes lacking a practical application or elements that are significantly more than the judicial exception. Each of the dependent claims simply provides further probability analysis, further defines how the analysis occurs, or produces an output of the analysis. As such, none of the dependent claims appears to improve the functioning of a computer or require the use of a specific machine and thus is directed towards an abstract idea. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 6, 11-14, and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US Pre-Grant Publication 2020/0327381) in view of Barborak et al. (US Pre-Grant Publication 2013/0017524). 

As to claim 1, Tan et al. teaches a computer-implemented method of extracting structured data from unstructured or semi-structured text in an electronic document, the method comprising: 
tokenizing the text as a token sequence (see paragraph [0031]. Tan shows a text classification service. The text classification service parses the text to determine a label for different parts of a text); 
extracting features for each token of the token sequence (see paragraphs [0031]-[0034]. Words are extracted from the text. Weights are assigned to each word); 
applying a data extraction model to the extracted features to determine a highest-probability label sequence for the token sequence, wherein the label sequence assigns a label to each token … (see paragraphs [0034]-[0036], Labels are assigned to each word. A text classification may be based on a weighted sum of a combination of all the extract features. The labels indicate the relevance of the text to different topics. The topics are potential questions); 
Tan does not disclose: 
… a label which indicates if that token is relevant to a question;
wherein if the highest-probability label sequence indicates that at least one token is relevant to the question, then that token(s) is provided as an answer to the question, the extracted structured data comprising the answer in that event; 
wherein if highest-probability label sequence indicates that no token is relevant to the question, then an answer forcing process is applied by: 
determining, for each token of the token sequence, a marginal probability of that token being relevant, wherein the at least one of the marginal probabilities is used to determine a forced answer.  
Barborak teaches: 
… a label which indicates if that token is relevant to a question (see paragraph [0056] and Abstract. Barborak is directed towards determining tagging pieces of information as evidence to generate candidate answers);
wherein if the highest-probability label sequence indicates that at least one token is relevant to the question, then that token(s) is provided as an answer to the question, the extracted structured data comprising the answer in that event (see paragraph [0056]. Good evidence may be identified as directly related to an input question. Also see paragraph [0057], where each candidate answer is assigned a confidence score based on an aggregation of evidence scores. Good evidence provides high confidence); 
wherein if highest-probability label sequence indicates that no token is relevant to the question (see paragraph [0056]-[0057]. Marginal evidence may be identified. The marginal evidence is considered as missing information relevant to a question), then an answer forcing process is applied by: 
determining, for each token of the token sequence, a marginal probability of that token being relevant, wherein the at least one of the marginal probabilities is used to determine a forced answer (see paragraphs [0056]-[0057]. A marginal probability of tokens may be determined in order to generate an answer).  
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Tan by the teachings of Barborak because both references are directed towards parsing and labeling information from text data. Barborak provides Tan with further utility by providing a question and answer system that will allow users to make use of the tagging of Tan to answer user queries relevant to those tags. 

As to claim 2, Tan as modified by Barborak teaches the method of claim 1, wherein, if the highest marginal probability meets a probability threshold, the forced answer comprises at least the token having the highest marginal probability, and if not, the forced answer is a null result (see Barborak paragraphs [0057]-[0059]. Answers that surpass a confidence threshold are provided. Answers that do not are not provided, see paragraph [0047]. Marginal answers may be refined to increase their confidence, or probability).  

As to claim 3, Tan as modified by Barborak teaches the method of claim 1, wherein dynamic programming is used to determine the label sequence having the highest computed probability, without computing the probability of every possible label sequence (see paragraph [0056]. Answers are determined based on a question. Every possible label sequence is not described as being considered), and 
to compute the marginal probability for each token, without requiring the probability of every possible label sequence to be computed (see paragraphs [0056]-[0059]. Marginal confidence is determined for a question. Every possible label sequence is not described as being considered).  

As to claim 4, Tan as modified by Barborak teaches the method of claim 1, wherein a first dynamic programming algorithm may be used to determine the highest-probability label sequence, and a second dynamic programming algorithm may be used to determine the marginal probability of each token (see paragraphs [0056]-[0059]. Marginal probability may be calculated and determined using a different algorithm from good probability      ).  

As to claim 6, Tan as modified teaches the method of claims 1, wherein the data extraction model is applied to the extracted features to compute a probability of each label sequence of multiple candidate label sequences, wherein each label sequence assigns a label to each token (see Tan paragraphs [0034]-[0036], Labels are assigned to each sequence based on probabilities), 
which indicates if that token is relevant to a question (see paragraph [0056] and Abstract. Barborak); 
wherein, for each token of the token sequence, a marginal probability of that token being relevant is computed as a sum of the probabilities computed for the subset of all candidate label sequences for which that token is relevant, wherein the at least one of the marginal probabilities is used to determine a forced answer (see Barborak paragraphs [0056]-[0059]. Only a single set of probabilities is used to determine marginal probabilities of tokens in response to the questions. This marginal probability is improved, or summed, based on follow up inquiries). 

As to claims 11 and 17, see the rejection of claim 1. 
As to claims 12 and 18, see the rejection of claim 2. 
As to claims 13 and 19, see the rejection of claim 3. 
As to claims 14 and 20, see the rejection of claim 4. 
As to claim 16, see the rejection of claim 6. 

Claims 5 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US Pre-Grant Publication 2020/0327381) in view of Barborak et al. (US Pre-Grant Publication 2013/0017524), and further in view of Hoffberg (US Pre-Grant Publication 2010/0317420). 

As to claim 5, Tan as modified teaches the method of claim 4, wherein the first dynamic programming algorithm is the Viterbi algorithm, and the second dynamic programming algorithm is the forward- backward algorithm.  
Hoffberg teaches wherein the first dynamic programming algorithm is the Viterbi algorithm, and the second dynamic programming algorithm is the forward- backward algorithm (see paragraphs [0121]-[0122]. The techniques discussed in Hoffberg are directed towards identifying a statistical certainty that a fact is true. See paragraphs [0189]-[0190] for the use of the Viterbi algorithm to analyze data. See paragraphs [0213]-[0215] for the usage of the Baum-Welch algorithm, or Forward-Backward algorithm, to use occurrence counting to determine truth). 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have made use of these algorithms when determining the truth of facts in unstructured text documents in Tan as modified by Barborak because both of these algorithms represent well known choices for determining the probability associated with a fact (see Hoffberg [0121]-[0122]). Because Tan as modified by Barborak is directed towards identifying the confidence of a fact (see Barborak paragraphs [0056]-[0057]), one of ordinary skill in the art would have thought to use well known and efficient methods of calculating probabilities (see Hoffberg [0121]-[0122] and [0190] and [0215]). 

As to claim 15, see the rejection of claim 5. 

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US Pre-Grant Publication 2020/0327381) in view of Barborak et al. (US Pre-Grant Publication 2013/0017524), and further in view of Ikeda et al. (US Patent 6,243,723). 

As to claim 7, Tan as modified teaches the method of claim 1. 
Tan does not clearly show: 
displaying on a graphical user interface a tabulated format comprising multiple rows and columns, each row associated with a respective document and each column associated with a respective question; 
inserting an answer or forced answer corresponding to a respective document and a respective question into a cell corresponding to the respective document and the respective question.  
Ikeda teaches: 
displaying on a graphical user interface a tabulated format comprising multiple rows and columns, each row associated with a respective document and each column associated with a respective question (see Figure 8 and 10:1-29. Notably, each row is associated with at least one column and each column is associated with metadata, or a question); 
inserting an answer or forced answer corresponding to a respective document and a respective question into a cell corresponding to the respective document and the respective question (see Figure 8 and 10:1-29. An answer in the form of a reference to a document is associated into a cell corresponding to the document and the question). 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Tan by the teachings of Ikeda because both references are directed towards parsing and labeling information from text data. Ikeda provides Tan with further utility by providing a way for a user of Tan to navigate documents based on which questions the documents answer. 

Claim 8-9 are rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US Pre-Grant Publication 2020/0327381) in view of Barborak et al. (US Pre-Grant Publication 2013/0017524), and further in view of Ikeda et al. (US Patent 6,243,723), and further in view of Baughman et al. (US Pre-Grant Publication 2018/0285449). 

As to claim 8, Tan as modified by Barborak teaches the method of claim 7, wherein if the highest-probability label sequence indicates that at least one token is relevant to the question and is provided as the answer, the method comprising assigning the answer a high confidence tag (see Barborak paragraph [0056]. Answers are assigned tags of good, bad, or marginal), 
wherein if highest-probability label sequence indicates that no token is relevant to the question, assigning the forced answer a low confidence tag (see Barborak paragraph [0056].Answers may be assigned a marginal tag). 
Tan as modified does not show the method further comprising displaying the high confidence tag and low confidence tag with the answer in the tabulated format. 
Baughman et al. shows the method further comprising displaying the high confidence tag and low confidence tag with the answer in the tabulated format (see paragraph [0028]. A table is calculated to display prominence, or confidence, for each particular term, or answer, occurring within a document. The prominence would indicate a high confidence tag or a low confidence tag for a document with a particular answer). 
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Tan by the teachings of Baughman because both references are directed towards parsing and labeling information from text data. Baughman provides Tan with further utility by providing a way for a user of Tan to navigate documents based on which questions the documents answer. 

As to claim 9, Tan as modified teaches the method of claim 8, further comprising: 
displaying an overall confidence on the graphical user interface  (see Baughman paragraph [0028]. Overall confidences for each word are displayed in the cell for the document and the word), 
wherein displaying the overall confidence comprises displaying a percentage of answers assigned the high confidence tag and a percentage of answers assigned the low confidence tag (see Baughman paragraph [0028]. A percentage of the confidences displayed with have a high confidence and a percentage will have a low confidence or no confidence number). 

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Tan et al. (US Pre-Grant Publication 2020/0327381) in view of Barborak et al. (US Pre-Grant Publication 2013/0017524), and further in view of Ikeda et al. (US Patent 6,243,723), and further in view of Miller et al. (US Pre-Grant Publication 2014/0149403)

As to claim 10, Tan as modified teaches the method of claim 7. 
Tan as modified does not teach the method comprising: 
receiving a selection of an answer or forced answer in the tabulated format, and 
in response, displaying a portion of the respective document comprising the answer or forced answer on the graphical user interface.
Miller teaches the method comprising: 
receiving a selection of an answer or forced answer in the tabulated format (see Figure 2 and paragraphs [0051] and [0056]), and 
in response, displaying a portion of the respective document comprising the answer or forced answer on the graphical user interface (see Figure 2 and paragraphs [0051] and [0056]. Users may select a paragraph containing answers to be shown be shown the answers).
It would have been obvious to one of ordinary skill in the art before the earliest filing date of the invention to have modified Tan by the teachings of Miller because both references are directed towards parsing and labeling information from text data. Miller provides Tan with further utility by providing a user interface that will allow a user of Tan to more easily select and view portions of documents based on which questions the documents answer. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES D ADAMS whose telephone number is (571)272-3938. The examiner can normally be reached M-F, 9-5:30 EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Neveen Abel-Jalil can be reached on 571-270-0474. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/CHARLES D ADAMS/           Primary Examiner, Art Unit 2152