Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
2. 	The information disclosure statement (IDS) submitted on 11/28/2017 was filed and the submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings are objected to because the rectangles detailing Low-level Concept, High-level Concept, and Understanding are not clear in figure 5.    Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.

Specification
The disclosure is objected to because of the following informalities: In paragraph 51 of the specification the following is stated: “To avoid two large matrices interacting directly, some aspects may constrain the matrix                                 
                                    
                                        
                                            U
                                        
                                        
                                            T
                                        
                                    
                                    V
                                
                             to be symmetric, which is equivalent to                                 
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    =
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            T
                                        
                                    
                                    
                                        
                                            U
                                        
                                        
                                            T
                                        
                                    
                                    V
                                    
                                        
                                            h
                                        
                                        
                                            j
                                        
                                        
                                            y
                                        
                                    
                                
                             , where                                 
                                    U
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            k
                                            ×
                                            
                                                
                                                    d
                                                
                                                
                                                    h
                                                
                                            
                                        
                                    
                                    ,
                                     
                                    D
                                    ∈
                                    
                                        
                                            R
                                        
                                        
                                            k
                                            ×
                                            k
                                        
                                    
                                
                             and D is a diagonal matrix.” The diagonal matrix D is not listed in the equation                                 
                                    
                                        
                                            S
                                        
                                        
                                            i
                                            j
                                        
                                    
                                    =
                                    
                                        
                                            
                                                
                                                    
                                                        
                                                            h
                                                        
                                                        
                                                            i
                                                        
                                                        
                                                            x
                                                        
                                                    
                                                
                                            
                                        
                                        
                                            T
                                        
                                    
                                    
                                        
                                            U
                                        
                                        
                                            T
                                        
                                    
                                    V
                                    
                                        
                                            h
                                        
                                        
                                            j
                                        
                                        
                                            y
                                        
                                    
                                
                             .  
Appropriate correction is required.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1-4, 8-11, and 15-18 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
The terms "low-level and high-level" in claims 1-4, 8-11, and 15-18 are relative terms which renders the claim indefinite.  The terms "low-level and high-level” are not defined by the claims, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. 


Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to abstract idea without significantly more. 
	IN step 1, claim 1 falls within one of the four statutory categories since it is a machine claim.
IN step 2A prong 1, the claim 1 sets forth: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position; determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs;…for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context;… for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context; determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context; and providing an output representing the answer to the question. The claim recites 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: processing circuitry, a memory storing instructions and computing. All these elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of processing circuitry, a memory storing instructions and computing amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. And thus, the claim is not patent eligible.
Regarding dependent claim 2, the rejection of claim 1 is incorporated and further, 
IN step 2A prong 1 it recites the low-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 

IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This 
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: a classification of amounts to only describe the low-level meaning and does not impart any functionality to the claimed process. This additional elements does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 1 above.
Regarding dependent claim 3, the rejection of claim 1 is incorporated and further, 
IN step 2A prong 1 it recites the high-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This element as recited is not meaningful limitations, as it only describes the high-level meaning and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it 
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: a classification of amounts to only describe the high-level meaning and does not impart any functionality to the claimed process. This additional elements does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 1 above.
Regarding dependent claim 4, the rejection of claim 1 is incorporated and further, 
IN step 2A prong 1 it recites ….for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position and one or more high-level meanings associated with the word, wherein the first probability and the second probability are….based on the history. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea.
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: storing and computed. These elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, this additional elements does not integrate the abstract idea into a 
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of storing and computed amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 1 above.
Regarding dependent claim 5, the rejection of claim 1 is incorporated and further,
IN step 2A prong 1, it recites …meaning of the question and…meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: determining an additional-level. This element as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of 
Regarding dependent claim 6, the rejection of claim 1 is incorporated and further,
IN step 2A prong 1 it recites…meaning of the question and….meaning of the context, wherein the first probability and the second probability are…based on the additional-level meaning. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: determining an additional-level and computed. Determining an additional-level as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e). Computed is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of: determining an 
Regarding dependent claim 7, the rejection of claim 1 is incorporated and further,
 IN step 2A prong 1, it recites…where the first probability is maximized;…where the second probability is maximized; determining that the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: determining a first position and determining a second position. These elements as recited are not meaningful limitations, as it only describes the determining the answer to the question and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of determining a first position and determining a second position amount to only describing determining the answer to 
IN step 1, claim 8 falls within one of the four statutory categories since it is a machine claim.
IN step 2A prong 1, the claim 8 sets forth: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position; determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs;…for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context;… for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context; determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context; and providing an output representing the answer to the question. The claim recites processes that can be practically performed in the human mind. Thus, the claim is an abstract idea in the "mental process" grouping. 
 computing. All these elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of computing amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. And thus, the claim is not patent eligible.
Regarding dependent claim 9, the rejection of claim 8 is incorporated and further, 
IN step 2A prong 1 it recites the low-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This element as recited is not meaningful limitations, as it only describes the low-level meaning and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it 
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: a classification of amounts to only describe the low-level meaning and does not impart any functionality to the claimed process. This additional elements does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 8 above.
Regarding dependent claim 10, the rejection of claim 8 is incorporated and further, 
IN step 2A prong 1 it recites the high-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This element as recited is not meaningful limitations, as it only describes the high-level meaning and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of 
Regarding dependent claim 11, the rejection of claim 8 is incorporated and further, 
IN step 2A prong 1 it recites ….for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position and one or more high-level meanings associated with the word, wherein the first probability and the second probability are….based on the history. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea.
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: storing and computed. These elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, this additional elements does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of storing and computed 
Regarding dependent claim 12, the rejection of claim 8 is incorporated and further,
IN step 2A prong 1, it recites …meaning of the question and…meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: determining an additional-level. This element as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: determining an additional-level amounts to only describing the operations and does not impart any functionality to the claimed process. This additional element does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. 
Regarding dependent claim 13, the rejection of claim 8 is incorporated and further,
IN step 2A prong 1 it recites…meaning of the question and….meaning of the context, wherein the first probability and the second probability are…based on the additional-level meaning. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: determining an additional-level and computed. Determining an additional-level as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e). Computed is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of: determining an additional-level and computed. Determining an additional-level amounts to only describing the operations and does not impart any functionality to the claimed process. And computed amounts to no more than mere instructions to apply the exception using generic computer components.  These additional elements do not integrate the abstract idea into something significantly more 
Regarding dependent claim 14, the rejection of claim 8 is incorporated and further,
 IN step 2A prong 1, it recites…where the first probability is maximized;…where the second probability is maximized; determining that the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: determining a first position and determining a second position. These elements as recited are not meaningful limitations, as it only describes the determining the answer to the question and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of determining a first position and determining a second position amount to only describing determining the answer to the question and do not impart any functionality to the claimed process. These additional elements do not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 8 above.

IN step 2A prong 1, the claim 15 sets forth: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position; determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs;…for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context;… for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context; determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context; and providing an output representing the answer to the question. The claim recites processes that can be practically performed in the human mind. Thus, the claim is an abstract idea in the "mental process" grouping. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: computing. All these elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). 
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of computing amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. And thus, the claim is not patent eligible.
Regarding dependent claim 16, the rejection of claim 15 is incorporated and further, 
IN step 2A prong 1 it recites the low-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This element as recited is not meaningful limitations, as it only describes the low-level meaning and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: a classification of 
Regarding dependent claim 17, the rejection of claim 15 is incorporated and further, 
IN step 2A prong 1 it recites the high-level meaning corresponds to…one or more words or phrases.  All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: a classification of. This element as recited is not meaningful limitations, as it only describes the high-level meaning and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: a classification of amounts to only describe the high-level meaning and does not impart any functionality to the claimed process. This additional elements does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. 
Regarding dependent claim 18, the rejection of claim 15 is incorporated and further, 
IN step 2A prong 1 it recites ….for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position and one or more high-level meanings associated with the word, wherein the first probability and the second probability are….based on the history. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea.
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: storing and computed. These elements are recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, this additional elements does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B, the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of storing and computed amounts to no more than mere instructions to apply the exception using generic computer components. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 15 above.

IN step 2A prong 1, it recites …meaning of the question and…meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position. All of these processes can be defined as mental processes.  Accordingly, the claim recites a mental process and thus is an abstract idea. 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional element of: determining an additional-level. This element as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e).  Accordingly, this additional element does not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of: determining an additional-level amounts to only describing the operations and does not impart any functionality to the claimed process. This additional element does not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible. Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 15 and 18 above.
Regarding dependent claim 20, the rejection of claim 15 is incorporated and further,
IN step 2A prong 1 it recites…meaning of the question and….meaning of the context, wherein the first probability and the second probability are…based on the additional-level 
IN step 2A prong 2, this judicial exception is not integrated into a practical application. In particular, the claim only recites the following additional elements of: determining an additional-level and computed. Determining an additional-level as recited is not meaningful limitations, as it only describes the operations and does not impart any functionality to the claimed process. See MPEP 2106.05(e). Computed is recited at a high-level of generality such that it amounts to no more than mere instructions to apply the exception using generic computer components. See MPEP 2106.05(f). Accordingly, these additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.  
IN step 2B the claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of: determining an additional-level and computed. Determining an additional-level amounts to only describing the operations and does not impart any functionality to the claimed process. And computed amounts to no more than mere instructions to apply the exception using generic computer components.  These additional elements do not integrate the abstract idea into something significantly more than the judicial exception. And thus, the claim is not patent eligible.Accordingly, this claim is also ineligible for the reasons set forth in connection with claim 15 above.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Seo et al. "Bidirectional attention flow for machine comprehension." (2016, “Seo”).
Regarding claim 1, Seo teaches a system comprising: processing circuitry(Seo, pg. 5 sec. 4 Question Answering Experiments, “The training process takes roughly 20 hours on a single Titan X GPU.”); and a memory storing instructions which, when executed by the processing circuitry, cause the processing circuitry to perform operations comprising: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position(Seo, pg. 6, sec. 4 Question Answering Experiments, fig.1, “The model architecture used for this task is depicted in Figure 1. Each paragraph [/context] and question [/query] are tokenized by a regular-expression-based word tokenizer (PTB Tokenizer) and fed into the model [of figure 1].” Note:   It is being interpreted that each paragraph/context represents a context of text and each question/query represents a question related to the context, ; determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases(Seo, pg. 2, sec. 2 Model,  fig.1,  “Our machine comprehension model is a hierarchical multi-stage process and consists of six layers…  [The first layer is the ]Character Embedding Layer [which] maps each word to a vector space using character-level CNNs. [The second layer is the] Word Embedding Layer [which] maps each word to a vector space using a pre-trained word embedding model. [And the third layers is the] Contextual Embedding Layer [which] utilizes contextual cues from surrounding words to refine the embedding of the words. These first three layers are applied to both the query and context[independently]. Note: It is interpreted that the first three layers represent determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases)1; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs(Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer represents determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs)2; computing, for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context(Seo, pg. 4, fig.1(output layer),  “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector.”); computing, for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context (Seo, pg. 4, fig.1(output layer), “For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            )”); determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context;  and providing an output representing the answer to the question (Seo, pg. 4,  “The answer span (k, l) where k                 
                    ≤
                
             l with the maximum value of                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                     
                    
                        
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
               is chosen [and outputted, where k and l represent the k-th and l-th value of the vector p].”).
Regarding claim 2, Seo teaches, the system of claim 1, wherein the low-level meaning corresponds to a classification of one or more words or phrases (Seo, pg. 3, sec. 2 Model,  fig.1, “[The] [c]haracter embedding layer is responsible for mapping each word to a high-dimensional vector space… we obtain the character level embedding of each word using Convolutional Neural Networks (CNN)…[the] word embedding layer also maps each word to a high-dimensional vector space. We use pre-trained word vectors, GloVe… to obtain the fixed word H                 
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
             from the context word vectors X, and U                
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            J
                        
                    
                
             from query word vectors Q.”).
Regarding claim 3, Seo teaches the system of claim 1, wherein the high-level meaning corresponds to a classification of one or more sentences or a paragraph (Seo, pgs. 3-4, sec. 2 Model,  fig.1, “[The] Attention flow layer is responsible for linking and fusing information from the context and the query words… The similarity matrix is computed by                 
                    
                        
                            S
                        
                        
                            t
                            j
                        
                    
                    =
                    α
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                    ∈
                    R
                
             where                
                     
                    α
                
             is a trainable scalar function that encodes the similarity between its two input vectors                
                    
                        
                             
                            H
                        
                        
                            :
                            t
                        
                    
                
             is t-th column vector of H, and                 
                    
                        
                            U
                        
                        
                            :
                            j
                        
                    
                
             is the is j-th column vector of U…The attention weight [for the query words] is computed by                 
                    
                        
                            a
                        
                        
                            t
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    
                        
                            
                                
                                    S
                                
                                
                                    t
                                    :
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            J
                        
                    
                
            and subsequently each attended query vector is                  
                    
                        
                            
                                
                                    U
                                
                                
                                    :
                                    t
                                
                            
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                j
                            
                        
                        
                            
                                
                                    a
                                
                                
                                    t
                                    j
                                
                            
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                
            …We obtain the attention weights on the context words by                 
                    b
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            
                                
                                    
                                        
                                            max
                                        
                                        
                                            col
                                        
                                    
                                
                                ⁡
                                
                                    (
                                    S
                                    )
                                    )
                                    ∈
                                    R
                                
                            
                        
                        
                            T
                        
                    
                
            …Then the attended context vector is                  
                    
                        
                            h
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                t
                            
                        
                        
                            
                                
                                    b
                                
                                
                                    t
                                
                            
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                        
                    
                
            . This vector indicates the weighted sum of the most important words in the context with respect to the query.                 
                    
                        
                            h
                        
                        ~
                    
                
             is tiled T times across the column, thus giving                 
                    
                        
                            H
                        
                        ~
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    .
                     
                
            Finally, the contextual embeddings and the attention vectors are combined together to yield G [defined as]                 
                    
                        
                            G
                        
                        
                            :
                            t
                        
                    
                    =
                    β
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            U
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            H
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    d
                                
                                
                                    G
                                
                            
                        
                    
                
            where                
                    
                        
                             
                            G
                        
                        
                            :
                            t
                        
                    
                
             is the t-th column vector (corresponding to t-th context word),                
                    β
                
             is a trainable vector function that fuses its (three) input vectors, and                
                     
                    
                        
                            d
                        
                        
                            G
                        
                    
                
            is the output dimension of the                 
                    β
                
             function.”). 
Regarding claim 4 Seo teaches the system of claim 1, the operations further comprising: storing, for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position(Seo, pg. 2, sec. 2 Model,  fig.1,  “Our machine comprehension model is a hierarchical multi-stage process and consists of six layers…  [The first layer is the] Character Embedding Layer [which] maps each word to a vector space using  Note: It is interpreted that the first three layers form a low level history representing one or more low-level meanings associated with a word at the position)and one or more high- level meanings associated with the word (Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer forms a high level history that represents one or more high-level meanings associated with the word), wherein the first probability and the second probability are computed based on the history(Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Regarding claim 5 Seo teaches the system of claim 4, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-level meaning of the context and the matrix M contains the history that represents one or more additional-level meanings associated with the word at the position).
Regarding claim 6 Seo teaches the system of claim 1, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling layer captures the interaction among the context words conditioned on the query. This is different from the contextual embedding layer, which captures the interaction among context words independent of the query. We use two layers of bi-directional LSTM, with the output size of d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-level meaning of the context), wherein the first probability and the second probability are computed based on the additional-level meaning(Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Regarding claim 7 Seo teaches the system of claim 1, wherein determining the answer to the question comprises: determining a first position where the first probability is maximized; determining a second position where the second probability is maximized; determining that the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position(Seo, pg. 4,  “The answer span (k, l) where k                 
                    ≤
                
             l with the maximum value of                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                     
                    
                        
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
               is chosen which can be computed in linear time with dynamic programming.” Note: It is interpreted that k and its probability                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                
             represents a first position where the first postion where the first probability is maximized, l and its probability                
                    
                        
                             
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
             represents a second position where the second probability is maximized, and the answer span (k, l) where k                 
                    ≤
                
             l represents the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position).
Regarding claim 8, Seo teaches a non-transitory machine-readable medium storing instructions which, when executed by processing circuitry of one or more machines, cause the processing circuitry to perform operations comprising: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position(Seo, pg. 6, sec. 4 Question Answering Experiments, fig.1, “The model architecture used for this task is depicted in Figure 1. Each paragraph [/context] and question [/query] are tokenized by a regular-expression-based word tokenizer (PTB Tokenizer) and fed into the model [of figure Note:   It is being interpreted that each paragraph/context represents a context of text and each question/query represents a question related to the context, and the tokenization by a regular-expression-based word tokenizer (PTB Tokenizer) represents the context of text comprising a listing of words, each word having a position); determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases(Seo, pg. 2, sec. 2 Model,  fig.1,  “Our machine comprehension model is a hierarchical multi-stage process and consists of six layers…  [The first layer is the ]Character Embedding Layer [which] maps each word to a vector space using character-level CNNs. [The second layer is the] Word Embedding Layer [which] maps each word to a vector space using a pre-trained word embedding model. [And the third layers is the] Contextual Embedding Layer [which] utilizes contextual cues from surrounding words to refine the embedding of the words. These first three layers are applied to both the query and context[independently]. Note: It is interpreted that the first three layers represent determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases)3; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs(Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer represents determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs)4; computing, for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context(Seo, pg. 4, fig.1(output layer),  “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector.”); computing, for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context (Seo, pg. 4, fig.1(output layer), “For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            )”); determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context;  and providing an output representing the answer to the question (Seo, pg. 4,  “The answer span (k, l) where k                 
                    ≤
                
             l with the maximum value of                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                     
                    
                        
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
               is chosen [and outputted, where k and l represent the k-th and l-th value of the vector p].”).
Regarding claim 9, Seo teaches, the machine-readable medium of claim 8, wherein the low-level meaning corresponds to a classification of one or more words or phrases (Seo, pg. 3, sec. 2 Model,  fig.1, “[The] [c]haracter embedding layer is responsible for mapping each word to a high-dimensional vector space… we obtain the character level embedding of each word using Convolutional Neural Networks (CNN)…[the] word embedding layer also maps each word to a H                 
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
             from the context word vectors X, and U                
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            J
                        
                    
                
             from query word vectors Q.”).
Regarding claim 10, Seo teaches, the machine-readable medium of claim 8, wherein the high-level meaning corresponds to a classification of one or more sentences or a paragraph (Seo, pgs. 3-4, sec. 2 Model,  fig.1, “[The] Attention flow layer is responsible for linking and fusing information from the context and the query words… The similarity matrix is computed by                 
                    
                        
                            S
                        
                        
                            t
                            j
                        
                    
                    =
                    α
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                    ∈
                    R
                
             where                
                     
                    α
                
             is a trainable scalar function that encodes the similarity between its two input vectors                
                    
                        
                             
                            H
                        
                        
                            :
                            t
                        
                    
                
             is t-th column vector of H, and                 
                    
                        
                            U
                        
                        
                            :
                            j
                        
                    
                
             is the is j-th column vector of U…The attention weight [for the query words] is computed by                 
                    
                        
                            a
                        
                        
                            t
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    
                        
                            
                                
                                    S
                                
                                
                                    t
                                    :
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            J
                        
                    
                
            and subsequently each attended query vector is                  
                    
                        
                            
                                
                                    U
                                
                                
                                    :
                                    t
                                
                            
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                j
                            
                        
                        
                            
                                
                                    a
                                
                                
                                    t
                                    j
                                
                            
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                
            …We obtain the attention weights on the context words by                 
                    b
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            
                                
                                    
                                        
                                            max
                                        
                                        
                                            col
                                        
                                    
                                
                                ⁡
                                
                                    (
                                    S
                                    )
                                    )
                                    ∈
                                    R
                                
                            
                        
                        
                            T
                        
                    
                
            …Then the attended context vector is                  
                    
                        
                            h
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                t
                            
                        
                        
                            
                                
                                    b
                                
                                
                                    t
                                
                            
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                        
                    
                
            . This vector indicates the weighted sum of the most important words in the context with respect to the query.                 
                    
                        
                            h
                        
                        ~
                    
                
             is tiled T times across the column, thus giving                 
                    
                        
                            H
                        
                        ~
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    .
                     
                
            Finally, the contextual embeddings and the attention vectors are combined together to yield G [defined as]                 
                    
                        
                            G
                        
                        
                            :
                            t
                        
                    
                    =
                    β
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            U
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            H
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    d
                                
                                
                                    G
                                
                            
                        
                    
                
            where                
                    
                        
                             
                            G
                        
                        
                            :
                            t
                        
                    
                
             is the t-th column vector (corresponding to t-th context word),                
                    β
                
             is a trainable vector function that fuses its (three) input vectors, and                
                     
                    
                        
                            d
                        
                        
                            G
                        
                    
                
            is the output dimension of the                 
                    β
                
             function.”). 
Regarding claim 11, Seo teaches, the machine-readable medium of claim 8, the operations further comprising: storing, for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position(Seo, pg. 2, sec. 2 Model,   Note: It is interpreted that the first three layers form a low level history representing one or more low-level meanings associated with a word at the position)and one or more high- level meanings associated with the word (Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer forms a high level history that represents one or more high-level meanings associated with the word), wherein the first probability and the second probability are computed based on the history(Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Regarding claim 12, Seo teaches, the machine-readable medium of claim 11, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling layer captures the interaction among the context words conditioned on the query. This is different from the contextual embedding layer, which captures the interaction among context words independent of the query. We use two layers of bi-directional LSTM, with the output size of d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-level meaning of the context and the matrix M contains the history that represents one or more additional-level meanings associated with the word at the position).
Regarding claim 13, Seo teaches, the machine-readable medium of claim 8, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling layer captures the interaction among the context words conditioned on the query. This is different from the contextual embedding layer, which captures the interaction among context words independent of the query. We use two layers of bi-directional LSTM, with the output size of d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-), wherein the first probability and the second probability are computed based on the additional-level meaning (Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Regarding claim 14, Seo teaches, the machine-readable medium of claim 8, wherein determining the answer to the question comprises: determining a first position where the first probability is maximized; determining a second position where the second probability is maximized; determining that the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position(Seo, pg. 4,  “The answer span (k, l) where k                 
                    ≤
                
             l with the maximum value of                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                     
                    
                        
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
               is chosen which can be computed in linear time with dynamic programming.” Note: It is interpreted that k and its probability                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                
             represents a first position where the first postion where the first probability is maximized, l and its probability                
                    
                        
                             
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
             represents a second position where the second probability is maximized, and the answer span (k, l) where k                 
                    ≤
                
             l represents the answer to the question comprises the contiguous sub-listing of the words in the context between the first position and the second position).
Regarding claim 15, Seo teaches a method comprising: accessing a context of text and a question related to the context, the context of text comprising a listing of words, each word having a position(Seo, pg. 6, sec. 4 Question Answering Experiments, fig.1, “The model Note:   It is being interpreted that each paragraph/context represents a context of text and each question/query represents a question related to the context, and the tokenization by a regular-expression-based word tokenizer (PTB Tokenizer) represents the context of text comprising a listing of words, each word having a position); determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases(Seo, pg. 2, sec. 2 Model,  fig.1,  “Our machine comprehension model is a hierarchical multi-stage process and consists of six layers…  [The first layer is the ]Character Embedding Layer [which] maps each word to a vector space using character-level CNNs. [The second layer is the] Word Embedding Layer [which] maps each word to a vector space using a pre-trained word embedding model. [And the third layers is the] Contextual Embedding Layer [which] utilizes contextual cues from surrounding words to refine the embedding of the words. These first three layers are applied to both the query and context[independently]. Note: It is interpreted that the first three layers represent determining a low-level meaning of the question and a low-level meaning of the context, the low-level meaning corresponding to words or phrases)5; determining a high-level meaning of the question and a high-level meaning of the context, the high-level meaning corresponding to sentences or paragraphs(Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer represents 6; computing, for each position i in the context, a first probability that an answer to the question starts at the position i, the first probability being based on the low- level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context(Seo, pg. 4, fig.1(output layer),  “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector.”); computing, for each position j in the context, a second probability that the answer to the question ends at the position j, the second probability being based on the low-level meaning of the question, the low-level meaning of the context, the high-level meaning of the question, and the high-level meaning of the context (Seo, pg. 4, fig.1(output layer), “For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            )”); determining the answer to the question based on the computed first probabilities and the computed second probabilities, the answer to the question comprising a contiguous sub-listing of the words in the context;  and providing an output representing the answer to the question (Seo, pg. 4,  “The answer span (k, l) where k                 
                    ≤
                
             l with the maximum value of                 
                    
                        
                            p
                        
                        
                            k
                        
                        
                            1
                        
                    
                     
                    
                        
                            p
                        
                        
                            l
                        
                        
                            2
                        
                    
                
               is chosen [and outputted, where k and l represent the k-th and l-th value of the vector p].”).
Regarding claim 16, Seo teaches, the method of claim 15, wherein the low-level meaning corresponds to a classification of one or more words or phrases (Seo, pg. 3, sec. 2 Model,  fig.1, “[The] [c]haracter embedding layer is responsible for mapping each word to a high-dimensional vector space… we obtain the character level embedding of each word using Convolutional Neural Networks (CNN)…[the] word embedding layer also maps each word to a high-dimensional vector space. We use pre-trained word vectors, GloVe… to obtain the fixed word embedding of each word…[then] [t]he concatenation of the character and word embedding vectors is passed to a two-layer Highway Network…[for the contextual embedding layer] We place an LSTM in both directions, and concatenate the outputs of the two LSTMs. Hence we obtain H                 
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
             from the context word vectors X, and U                
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            J
                        
                    
                
             from query word vectors Q.”).
Regarding claim 17, Seo teaches, the method of claim 15,wherein the high-level meaning corresponds to a classification of one or more sentences or a paragraph (Seo, pgs. 3-4, sec. 2 Model,  fig.1, “[The] Attention flow layer is responsible for linking and fusing information from the context and the query words… The similarity matrix is computed by                 
                    
                        
                            S
                        
                        
                            t
                            j
                        
                    
                    =
                    α
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                    ∈
                    R
                
             where                
                     
                    α
                
             is a trainable scalar function that encodes the similarity between its two input vectors                
                    
                        
                             
                            H
                        
                        
                            :
                            t
                        
                    
                
             is t-th column vector of H, and                 
                    
                        
                            U
                        
                        
                            :
                            j
                        
                    
                
             is the is j-th column vector of U…The attention weight [for the query words] is computed by                 
                    
                        
                            a
                        
                        
                            t
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    
                        
                            
                                
                                    S
                                
                                
                                    t
                                    :
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            J
                        
                    
                
            and subsequently each attended query vector is                  
                    
                        
                            
                                
                                    U
                                
                                
                                    :
                                    t
                                
                            
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                j
                            
                        
                        
                            
                                
                                    a
                                
                                
                                    t
                                    j
                                
                            
                            
                                
                                    U
                                
                                
                                    :
                                    j
                                
                            
                        
                    
                
            …We obtain the attention weights on the context words by                 
                    b
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            
                                
                                    
                                        
                                            max
                                        
                                        
                                            col
                                        
                                    
                                
                                ⁡
                                
                                    (
                                    S
                                    )
                                    )
                                    ∈
                                    R
                                
                            
                        
                        
                            T
                        
                    
                
            …Then the attended context vector is                  
                    
                        
                            h
                        
                        ~
                    
                    =
                    
                        
                            ∑
                            
                                t
                            
                        
                        
                            
                                
                                    b
                                
                                
                                    t
                                
                            
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                        
                    
                
            . This vector indicates the weighted sum of the most important words in the context with respect to the query.                 
                    
                        
                            h
                        
                        ~
                    
                
             is tiled T times across the column, thus giving                 
                    
                        
                            H
                        
                        ~
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    .
                     
                
            Finally, the contextual embeddings G [defined as]                 
                    
                        
                            G
                        
                        
                            :
                            t
                        
                    
                    =
                    β
                    
                        
                            
                                
                                    H
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            U
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                            ,
                             
                            
                                
                                    
                                        
                                            H
                                        
                                        ~
                                    
                                
                                
                                    :
                                    t
                                
                            
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            
                                
                                    d
                                
                                
                                    G
                                
                            
                        
                    
                
            where                
                    
                        
                             
                            G
                        
                        
                            :
                            t
                        
                    
                
             is the t-th column vector (corresponding to t-th context word),                
                    β
                
             is a trainable vector function that fuses its (three) input vectors, and                
                     
                    
                        
                            d
                        
                        
                            G
                        
                    
                
            is the output dimension of the                 
                    β
                
             function.”). 
Regarding claim 18, Seo teaches, the method of claim 15, the operations further comprising: storing, for each position in the context, a history, the history representing one or more low-level meanings associated with a word at the position(Seo, pg. 2, sec. 2 Model,  fig.1,  “Our machine comprehension model is a hierarchical multi-stage process and consists of six layers…  [The first layer is the] Character Embedding Layer [which] maps each word to a vector space using character-level CNNs. [The second layer is the] Word Embedding Layer [which] maps each word to a vector space using a pre-trained word embedding model. [And the third layer is the] Contextual Embedding Layer [which] utilizes contextual cues from surrounding words to refine the embedding of the words.” Note: It is interpreted that the first three layers form a low level history representing one or more low-level meanings associated with a word at the position)and one or more high- level meanings associated with the word (Seo, pg. 2, sec. 2 Model,  fig.1,  Our machine comprehension model is a hierarchical multi-stage process and consists of six layers… [the four layer is the] Attention Flow Layer [which] couples the query and context vectors and produces a set of query aware feature vectors for each word in the context.” Note: It is interpreted that the Attention flow layer forms a high level history that represents one or more high-level meanings associated with the word), wherein the first probability and the second probability are computed based on the history(Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Regarding claim 19, Seo teaches, the method of claim 18, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context, wherein the history represents one or more additional-level meanings associated with the word at the position(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling layer captures the interaction among the context words conditioned on the query. This is different from the contextual embedding layer, which captures the interaction among context words independent of the query. We use two layers of bi-directional LSTM, with the output size of d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-level meaning of the context and the matrix M contains the history that represents one or more additional-level meanings associated with the word at the position).
Regarding claim 20, Seo teaches, the method of claim 15, the operations further comprising: determining an additional-level meaning of the question and an additional-level meaning of the context(Seo, pgs. 4, sec. 2 Model,  fig.1, “The input to the modeling layer is G, which encodes the query-aware representations of context words. The output of the modeling layer captures the interaction among the context words conditioned on the query. This is different from d for each direction. Hence we obtain a matrix                 
                    M
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                    ,
                     
                
             which is passed onto the output layer to predict the answer. Each column vector of M is expected to contain contextual information about the word with respect to the entire context paragraph and the query.”Note: It is interpreted that the modeling layer represents an additional-level meaning of the question and an additional-level meaning of the context), wherein the first probability and the second probability are computed based on the additional-level meaning (Seo, pg. 4, fig.1(output layer), “The QA task requires the model to find a sub-phrase of the paragraph to answer the query… We obtain the probability distribution of the start index over the entire paragraph by                 
                    
                        
                            p
                        
                        
                            1
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    M
                    ]
                
            ), where                 
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            1
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            10
                            d
                        
                    
                
             is a trainable weight vector. For the end index of the answer phrase, we pass M to another bidirectional LSTM layer and obtain                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ∈
                    
                        
                            R
                        
                        
                            2
                            d
                            ×
                            T
                        
                    
                
            . Then we use                 
                    
                        
                            M
                        
                        
                            2
                        
                    
                
             to obtain the probability distribution of the end index in a similar manner:                
                     
                    
                        
                            p
                        
                        
                            2
                        
                    
                    =
                    s
                    o
                    f
                    t
                    m
                    a
                    x
                    (
                    
                        
                            w
                        
                        
                            
                                
                                    
                                        
                                            p
                                        
                                        
                                            2
                                        
                                    
                                
                            
                        
                        
                            T
                        
                    
                    [
                    G
                    ;
                    
                        
                            M
                        
                        
                            2
                        
                    
                    ]
                
            ).” Note: It is being interpreted that the vector G contains the history of the first four layers).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ADAM CLARK STANDKE whose telephone number is (571)270-1806.  The examiner can normally be reached on 7:00-5:00 M-Th.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
 (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/ADAM C STANDKE/Examiner, Art Unit 2122                                                                                                                                                                                                        
 /ERIC NILSSON/Primary Examiner, Art Unit 2122                                                                                                                                                                                                        


    
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
        
            
    

    
        1 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        2 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        3 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        4 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        5 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.
        6 According to the broadest reasonable interpretation (BRI), the use of alternative language amounts to the claim
        requiring one or more elements but not all.