DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claims 1-20 are pending and have been examined.
Information Disclosure Statement

The information disclosure statement (IDS) was submitted on 01/26/2018, 04/03/2019, 09/05/2019, and 03/19/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Priority
The present application was filed on 01/26/2018 and claims priority to Application No. 62/578,380 (filed on 10/27/2017).
Claim Objections
Claim 1 is objected to because of the following informalities:
The limitation “wherein the neural network model includes a plurality of model parameters learned according to a machine learning process” should terminate with a semicolon. Appropriate correction is required.
The limitation “generating an inference based on the codependent representation using a decoder of the neural network model” should terminate with a semicolon. Appropriate correction is required.
Claim Interpretation

The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 

Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with 
Claim 13
… a first input stage configured to generate…
… a second input stage configured to generate …
… an encoder stage configured to generate …
… an output layer configured to receive …
… a decoder stage configured to generate …
Upon a review of the disclosure, 
Specification [0029] provides the following: “In comparison to encoders that include a single coattention layer, deep coattention encoder 300 may be capable of generating a richer codependent representation 306 that contains more relevant information associated with the first and second input sequences. For example, deep coattention encoder 300 may include more trainable model parameters than single-layer coattention encoders. Moreover, whereas a single-layer coattention encoder may allow each sequence to attend to the other sequence, deep coattention encoder 300 may allow each sequence to attend to itself as well as to the other sequence. Consequently, deep coattention encoder 300 may be capable of achieving higher accuracy than single-layer coattention encoders in dual sequence inference problems, such as QA problems.
Specification [0030] provides the following: “FIG. 3B depicts a coattention layer 31 Of, which may be used to implement one or more of coattention layers 3 lOa-n depicted in FIG. 3A. As depicted in FIG. 3B, coattention layer. 31 Of includes a pair of encoding sub-layers 322 and 324 that each receive a respective layer input  representation 312e and 1 and E2. In some embodiments, encoding sub-layers 322 and/or 324 may include one or more recurrent neural network (RNN) layers. In general, an RNN layer injects sequence-related information (e.g., temporal information) into the transformed representation. For example, the RNN layer may include a sequence of simple RNN cells, long short-term memory (LSTM) cells, gated recurrent units (GRUs), and/or the like. In some examples, the RNN layer may be bi-directional, e.g., a bi-directional LSTM (Bi-LSTM) layer. Additionally or alternately, encoding sub-layers 322 and/or 324 may include a feed-forward neural network layer, and/or may perform any other suitable transformation or set of transformation on layer input representations 312e and 314e. In some embodiments, encoding sub-layers 322 and/or 324 may include one or more nonlinear activation functions (e.g., rectified linear units (ReLU), sigmoid, hypertangent (tanh), softmax, and/or the like).
Specification [0035] provides the following:  “Returning to FIG. 3A, deep coattention encoder 300 may additionally include an output layer 350 that receives summary representations 312n and/or 314n from the last layer among the plurality of coattention layers 310a-n and generates codependent representation 306. In some embodiments, output layer 350 may include a neural network layer, such as and RNN, feed-forward neural network, and/or the like.”
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-6 and 12-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.
Regarding Claim 1, 
Step 1 Analysis:  Claim 1 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim is directed to a method of dual sequence inference using a neural network model.  The following limitations:
generating a codependent representation based on a first input representation of a first sequence and a second input representation of a second sequence using … the … model; and
generating an inference based on the codependent representation using … the … model
wherein the … model includes a plurality of model parameters learned according to a … process
wherein the … comprises: a plurality of coattention layers arranged sequentially, each coattention layer configured to …. generate one or more summary representations, the pair of layer input representations corresponding to the one or more summary representations generated by a preceding layer among the plurality of coattention layers or, when the coattention layer is first among the plurality of coattention layers, to the first input representation and the second input representation; and 
an output layer configured to … generate the codependent representation
as drafted, are processes that, under its broadest reasonable interpretation, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of generic computer components (“computer-implemented”, “an encoder of”, “a decoder of”, “neural network”, “machine learning”).   For example, but for the generic computer components language, the above limitations in the context of this claim encompass generating a codependent representation based on a first input representation of a first sequence and a second input representation of a second sequence using … the … model (corresponds to evaluation) and generating an inference based on the codependent representation using … the … model (corresponds to evaluation), learning a plurality of model parameters (corresponds to evaluation and judgement), and generating one or more summary representations (corresponds to evaluation), which are considered mental processes.  Accordingly, the claim recites an abstract idea. 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The 
… receive a pair of layer input representations and …
… receive the one or more summary representations generated by a last layer among the plurality of coattention layers and …
are steps related to gathering data for use in a claimed process and are thus are considered insignificant extra solution activities. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 2, 
Step 1 Analysis:  Claim 2 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 1. The claim further recites wherein the first sequence corresponds to a document, the second sequence corresponds to a question, and the inference corresponds to a span of text in the document that answers the 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 3, 
Step 1 Analysis:  Claim 3 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 1. The claim further recites wherein the encoder includes at least one residual connection that bypasses one or more of the plurality of coattention layers. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 4, 
Step 1 Analysis:  Claim 4 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 1. The claim further recites wherein each of the plurality of coattention layers determines a set of affinity scores corresponding to each pair of items in the pair of layer input representations. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental 
Regarding Claim 5, 
Step 1 Analysis:  Claim 5 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 4. The claim further recites wherein each of the plurality of coattention layers determines the one or more summary representations based on the pair of layer input representations and the set of affinity scores. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 6, 
Step 1 Analysis:  Claim 6 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 5. The claim further recites wherein each of the plurality of coattention layers determines one or more context representations based on the set of the affinity scores and the one or more summary representations. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 12, 
Step 1 Analysis:  Claim 12 is directed to a method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the method of claim 1. The claim further recites wherein the output layer includes a bidirectional long short-term memory layer. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “an encoder of”, “a decoder of”, “neural network”, and “machine learning”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 13,
Step 1 Analysis:  Claim 13 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim is directed to a system for dual sequence inference using a neural network model.  The following limitations:
a … configured to generate a first input representation based on a first sequence;
a … configured to generate a second input representation based on a second sequence;
an … configured to generate a codependent representation based on the first input representation and the second input representation, the … comprising: a plurality of coattention layers arranged sequentially, each of the plurality of coattention layers … generating a pair of summary representations, the pair of layer input representations corresponds to the pair of summary representations generated a preceding layer among the plurality of coattention layers or, when the coattention layer is first among the 
a … configured to generate an inference based on the codependent representation
as drafted, are processes that, under its broadest reasonable interpretation, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of generic computer components (“first input stage”, “second input stage”, “encoder stage”, “decoder stage”).   For example, but for the generic computer components language, the above limitations in the context of this claim encompass generating a first input representation based on a first sequence (corresponds to evaluation), generating a codependent representation based on the first input representation and the second input representation (corresponds to evaluation) and generating an inference based on the codependent representation using (corresponds to evaluation), which are considered mental processes.  Accordingly, the claim recites an abstract idea. 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “first input stage”, “second input stage”, “encoder stage”, “decoder stage”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The following steps:
… receiving a pair of layer input representations and …
… to receive at least one of the pair of summary representations generated by a last layer among the plurality of coattention layers and …
are steps related to gathering data for use in a claimed process and are thus are considered insignificant extra solution activities. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 14,
Step 1 Analysis:  Claim 14 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the system of claim 13. The claim further recites wherein the encoder stage includes at least one residual connection that bypasses one or more of the plurality of coattention layers. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 15,
Step 1 Analysis:  Claim 15 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the system of claim 13. The claim further recites wherein each of the plurality of coattention layers includes an affinity node that determines a set of affinity scores based on the pair of layer input representations. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 16,
Step 1 Analysis:  Claim 16 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the system of claim 15. The claim further recites wherein each of the plurality of coattention layers includes a pair of summary nodes that determines the pair of summary representation based on the pair of layer input representations 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “first input stage”, “second input stage”, “encoder stage”, “decoder stage”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 17,
Step 1 Analysis:  Claim 17 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the system of claim 16. The claim further recites wherein each of the plurality of coattention layers includes a context node that determines a context representation based on the set of the affinity scores and at least one of the pair of summary representations. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “first input stage”, “second input stage”, “encoder stage”, “decoder stage”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental process.   Implementing a mental process in a computer environment or using a computer as a tool to perform a mental process cannot provide an inventive concept. 
Regarding Claim 18, 
Step 1 Analysis:  Claim 18 is directed to a system, which is directed to a machine, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the system of claim 13. The claim further recites wherein the output layer includes a bidirectional long short-term memory layer. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, performing a mental process in a computer environment or using a computer as a tool to perform a mental process.  See MPEP 2106.  The additional elements of “first input stage”, “second input stage”, “encoder stage”, “decoder stage”, as drafted, are reciting implementing a mental process in a computer environment and using a computer as a tool to perform a mental process. The claimed invention is described as a concept that is performed in the human mind and applicant is merely claiming that concept performed 1) on a generic computer, or 2) in a computer environment, or 3) is merely using a computer as a tool to perform the concept. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.   The claim is directed to an abstract idea. 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element amounts to implementing a mental process in a computer environment and using a computer as a tool to perform a mental 
Regarding Claim 19
Step 1 Analysis:  Claim 19 is directed to a medium, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The claim is directed to a medium having stored thereon a question answering model.  The following limitations:
a …. comprising;
a first encoding sub-layer that … generates a first encoded representation of the document;
a second encoding sub-layer that …generates a first encoded representation of the question;
a first affinity node that … generates a first set of affinity scores;
a first summary node that … generates a first summary representation of the document;
a second summary node that … generates a first summary representation of the question;
a third encoding sub-layer that … generates a second encoded representation of the document;
a fourth encoding sub-layer that … generates a second encoded representation of the question;
a second affinity node that … generates a second set of affinity scores;
a third summary node … generates a second summary representation of the document; and
an output layer that … generates a codependent representation; and
a … that …  predicts an answer span within the document
as drafted, are processes that, under its broadest reasonable interpretation, cover mental processes (concepts performed in the human mind (including an observation, evaluation, judgement, opinion)) but for the recitation of generic computer components (“a deep coattention encoder of the question answering model”, “a dynamic decoder of the question answering model”).   For example, but for the generic computer components language, the above limitations in the context of this claim encompass generating a first encoded representation of the document (corresponds to evaluation), generating a first encoded representation of the question (corresponds to evaluation), generating a first set of affinity scores (corresponds to evaluation), generating a first summary representation of the document (corresponds to evaluation), generating a first summary representation of the question (corresponds to evaluation), generating a second encoded representation of the document (corresponds to evaluation), generating a second encoded representation of the question (corresponds to evaluation),  generating a second set of affinity scores (corresponds to evaluation), generating a second summary representation of the document (corresponds to evaluation), generating a codependent representation (corresponds to evaluation) and predicting an answer span (corresponds to evaluation and judgement), which are considered mental processes.  Accordingly, the claim recites an abstract idea. 
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as tool to perform an 
The following steps:
… receives an input representation of a document and… 
…receives an input representation of a question and … 
... receives the first encoded representation of the document and the first encoded representation of the question and …
… receives the first encoded representation of the question and the first set of affinity scores and …
… receives the first encoded representation of the document and the first set of affinity scores and …
… receives the first summary representation of the document and …
… receives the first summary representation of the question and …
… receives the second encoded representation of the document and the second encoded representation of the question and …
… receives the second encoded representation of the question and the first set of affinity scores and …
… receives the second summary representation of the document and … 
… receives the codependent representation and … 

Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component.  Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  
Regarding Claim 20, 
Step 1 Analysis:  Claim 20 is directed to a medium, which is directed to an article of manufacture, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites the medium of claim 19. The claim further recites wherein the output layer further receives one or more of the first encoded representation of the document, the first summary representation of the document, or the second encoded representation of the document via one or more residual connections of the deep coattention encoder. Nothing in the claim elements precludes these steps from practically being performed in the mind.  
Step 2A Prong Two Analysis: This judicial exception is not integrated into a practical application.  In particular, the claim only recites additional elements that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as tool to perform an abstract idea.  See MPEP 2106.05(f).  The additional elements of “a deep coattention encoder of 
The following steps:
… receives an input representation of a document and… 
…receives an input representation of a question and … 
... receives the first encoded representation of the document and the first encoded representation of the question and …
… receives the first encoded representation of the question and the first set of affinity scores and …
… receives the first encoded representation of the document and the first set of affinity scores and …
… receives the first summary representation of the document and …
… receives the first summary representation of the question and …
… receives the second encoded representation of the document and the second encoded representation of the question and …
… receives the second encoded representation of the question and the first set of affinity scores and …
… receives the second summary representation of the document and … 
… receives the codependent representation and … 

Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component.  Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept.  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3, 7-9, 11, and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Buck et al. (“Ask the Right Questions: Active Question Reformulation with Reinforcement Wang et al. (“Gated Self-Matching Networks for Reading Comprehension and Question Answering”).
Regarding Claim 1, 
	Buck et al. teaches a computer implemented method for dual sequence inference using a neural network model (p. 4, section 2.2, paragraph 3 “ … we use a competitive neural answering model, BiDirectional Attention Flow (BiDAF).  BiDAF is an extractive QA system.  It takes as input a question and a document and returns as answer a continuous span from the passage…” teaches document [first sequence], question [second sequence], and answer span [inference]).
	Buck et al. does not appear to explicitly teach generating a codependent representation based on a first input representation of a first sequence and a second input representation of a second sequence using an encoder of the neural network model; and generating an inference based on the codependent representation using a decoder of the neural network model, wherein the neural network model includes a plurality of model parameters learned according to a machine learning process, and wherein the encoder comprises: a plurality of coattention layers arranged sequentially, each coattention layer being configured to receive a pair of layer input representations and generate one or more summary representations, the pair of layer input representations corresponding to the one or more summary representations generated by a preceding layer among the plurality of coattention layers or, when the coattention layer is first among the plurality of coattention layers, to the first input representation and the second input representation; and an output layer configured to receive the one or more summary representations from a last layer among the plurality of coattention layers and generate the codependent representation.
Wang et al. teaches generating a codependent representation based on a first input representation of a first sequence and a second input representation of a second sequence using an encoder of the neural network model (p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
  and p. 192, section 3.4, paragraph 1 
    PNG
    media_image2.png
    726
    686
    media_image2.png
    Greyscale
teaches generating                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    T
                                
                            
                        
                    [generating a codependent representation] based on passage representation                         
                            
                                
                                    {
                                    
                                        
                                            h
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    ;   p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
  and p. 190, section 3.1, paragraph 1 
    PNG
    media_image3.png
    760
    681
    media_image3.png
    Greyscale

teaches passage representation                         
                            
                                
                                    {
                                    
                                        
                                            h
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     dependent upon word-level embeddings and character level embeddings of document [first input representation of first sequence] and word-level embeddings and character level embeddings of question [second input representation of second sequence], which in turn is used to generate                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                      and                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                    , which are inputs to the Question and Passage Matching layer in Figure 1); and 
p. 199, Figure 1 and p. 192, section 3.4, paragraph 1, “We … use pointer networks to predict the start and end position of the answer…” teaches predicting the start and end positions of the answer [generating an inference based on the codependent representation] using a pointing network [using a decoder of the neural network model]),
	wherein the neural network model includes a plurality of model parameters learned according to a machine learning process (p. 199, Figure 1 and p. 192, section 3.4, paragraph 1, “We … use pointer networks to predict the start and end position of the answer… “ teaches the start and end positions of the passage as parameters as model parameters [a plurality of model parameters learned according to a machine learning process]), and 
wherein the encoder comprises: a plurality of coattention layers arranged sequentially, each coattention layer being configured to receive a pair of layer input representations and generate one or more summary representations, the pair of layer input representations corresponding to the one or more summary representations generated by a preceding layer among the plurality of coattention layers or, when the coattention layer is first among the plurality of coattention layers, to the first input representation and the second input p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

	teaches Question and Passage GRU layer, Question and Passage Matching Layer, and Passage Self-Matching Layer  arranged in order from bottom to top [wherein the encoder comprises: a plurality of coattention layers arranged sequentially],
teaches the Question and Passage Matching Layer [coattention layer] and Passage Self-Matching Layer [coattention layer] both receiving two inputs [a pair of layer input representations] and generating one output [one or more summary representations], and
teaches the pair of layer input representations to the Question and Passage Matching layer being representations of all words in the question                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                     [summary representation generated by a preceding layer] and all words in the passage                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                     [summary representation generated by a preceding layer];
p. 192, section 3.3, paragraph 1 
    PNG
    media_image4.png
    457
    685
    media_image4.png
    Greyscale

teaches the pair of layer input representations to the Passage Self-Matching Layer being the passage representation                         
                            
                                
                                    h
                                
                                
                                    t
                                
                                
                                    P
                                
                            
                        
                     [summary representation generated by a preceding layer] and attention pooling vector                         
                            
                                
                                    c
                                
                                
                                    t
                                
                            
                        
                     of the whole passage [summary representation generated by a preceding layer],
p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

p. 190, section 3.1, paragraph 1 
    PNG
    media_image3.png
    760
    681
    media_image3.png
    Greyscale

teaches the pair of layer input representations to the Question and Passage GRU Layer [first among the plurality of coattention layers] being the word-level embeddings/character level-embeddings of the words in the passage [first input representations] and word-level embeddings/character level-embeddings of the words in the question [second input representations]); 
p. 192, section 3.4, paragraph 1 
    PNG
    media_image2.png
    726
    686
    media_image2.png
    Greyscale
 teaches receiving                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    a
                                
                            
                        
                     [receiving the one or more summary representations] and generating                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    t
                                
                            
                        
                     [generating the codependent representation]).
Buck et al. and Wang et al. are considered analogous art because they are directed to using machine learning techniques to improve methods of answering questions.  
Buck et al. it would have been obvious for a person of ordinary skill in the art to apply the teachings of Wang et al. at the time the application was filed in order to dynamically enrich each passage representation with information aggregated from both question and passage which in turn leads to better answer predictions (cf. Wang et al., p. 190, section 1, paragraph 4, “By introducing a gating mechanism, our gated attention-based recurrent network assigns different levels of importance to passage parts depending on their relevance to the question, masking out irrelevant passage parts and emphasizing the important ones”). The Examiner notes that a person of ordinary skill in the art would find a suggestion to perform this type of analysis since Buck et al. discloses this as a necessary activity for the taught invention (cf. Buck et al., p. 2, Section 1, paragraph 2,   “ … Inspired by the human ability to produce and ask the right questions, we present an agent that learns to carry out this process for the user. The agent sits between the user and a backend QA system that we refer to as the ‘environment’. The agent aims to maximize the chance of getting the correct answer by reformulating and reissuing a user’s question to the environment. By asking many question variants, and aggregating the returned evidences, the agent comes up with the single best answer.”).
Regarding Claim 2, 
	Buck et al. in view of Wang et al. teaches the method of claim 1. 
	Buck et al. further teaches wherein the first sequence corresponds to a document, the second sequence corresponds to a question, and the inference corresponds to a span of text in the document that answers the question ((p. 4, section 2.2, paragraph 3 “ … we use a competitive neural answering model, BiDirectional Attention Flow (BiDAF).  BiDAF is an extractive QA system.  It takes as input a question and a document and returns as answer a continuous span from the passage…” teaches document [first sequence], question [second sequence], and answer span [inference]).
Regarding Claim 3, 
	Buck et al. in view of Wang et al. teaches the method of claim 1. 
	Buck et al. does not appear to explicitly teach wherein the encoder includes at least one residual connection that bypasses one or more of the plurality of coattention layers.
Wang et al. teaches wherein the encoder includes at least one residual connection that bypasses one or more of the plurality of coattention layers (p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

teaches at least one connection from the question vector to the output layer [at least one residual connection] that bypasses the Question and Passage Matching Layer and the Passage Self-Matching Layer [one or more of the plurality of coattention layers]).
Buck et al. and Wang et al. are combinable for the same rationale as set forth above with respect to claim 1. 
Regarding Claim 7,
	Buck et al. in view of Wang et al. teaches the method of claim 1. 
	Buck et al. further teaches wherein the machine learning process includes: generating a series of training inferences using the neural network model (p. 4, section 2.2, paragraph 3 “… we use a competitive neural question answering model, BiDirectional Attention Flow (BiDAF) … BiDAF is an extractive QA system. It takes as input a question and a document and returns as an answer a continuous span from the passage…” teach generating an answer span using BiDAF [generating a series of training inferences using the neural network model]);
	evaluating a mixed learning objective based on the series of training inferences (p. 3, Figure 1

    PNG
    media_image5.png
    702
    1309
    media_image5.png
    Greyscale
and p. 4, section 2.2, paragraph 2 “In the downward pass the reformulator, the box on the left in Figure 1, maps the original question into one or many alternative questions that will probe the environment for candidate answers. The reformulator is trained end-to-end, using an answer quality objective, for which we use the standard F1 metric. Due to the non-differentiable nature of this sequence-level loss, the model is trained using reinforcement learning … The aggregator, the box on the right of Figure 1, is a Convolutional Neural Network (CNN). This model’s task is to evaluate the candidate answers returned by the environment and select the one to return. Here, we assume that there is a single best answer, as is the case in our evaluation setting; returning multiple answers is a straightforward extension of the model. The CNN is trained with supervised learning” teaches an AQA agent that utilizes both reinforcement learning and supervised learning [evaluating a mixed learning objective] based on set of answers received from QA system [based on the series of training inferences]); and
	updating the plurality of model parameters based on the mixed learning objective (p.3, Figure 1 

    PNG
    media_image5.png
    702
    1309
    media_image5.png
    Greyscale

p. 6, section 3.3, paragraphs 1-2 
    PNG
    media_image6.png
    592
    1287
    media_image6.png
    Greyscale
teaches model parameters are used to define reformulation process and model parameters are optimized to maximize reward during aggregation process [updating the plurality of model parameters based on the mixed learning objective]).
Regarding Claim 8, 
	Buck et al. in view of Wang et al. teaches the method of claim 7. 
	Buck et al. further teaches wherein the mixed learning objective includes a combination of a supervised learning objective and a reinforcement learning objective ((p. 3, Figure 1

    PNG
    media_image5.png
    702
    1309
    media_image5.png
    Greyscale
and p. 4, section 2.2, paragraph 2 “In the downward pass the reformulator, the box on the left in Figure 1, maps the original question into one or many alternative questions that will probe the environment for candidate answers. The reformulator is trained end-to-end, using an answer quality objective, for which we use the standard F1 metric. Due to the non-differentiable nature of this sequence-level loss, the model is trained using reinforcement learning … The aggregator, the box on the right of Figure 1, is a Convolutional Neural Network (CNN). This model’s task is to evaluate the candidate answers returned by the environment and select the one to return. Here, we assume that there is a single best answer, as is the case in our evaluation setting; returning multiple answers is a straightforward extension of the model. The CNN is trained with supervised learning” teaches an AQA agent that utilizes both reinforcement learning and supervised learning based on set of answers received from QA system [wherein the mixed learning objective includes a combination of a supervised learning objective and a reinforcement learning objective]),
 	where the reinforcement learning objective is based on a non-binary evaluation metric (
p. 4, section 2.2, paragraph 2 “In the downward pass the reformulator, the box on the left in Figure 1, maps the original question into one or many alternative questions that will probe the environment for candidate answers. The reformulator is trained end-to-end, using an answer quality objective, for which we use the standard F1 metric. Due to the non-differentiable nature of this sequence-level loss, the model is trained using reinforcement learning …” and p.8, Table 1 
    PNG
    media_image7.png
    673
    1081
    media_image7.png
    Greyscale

teaches the use of non-binary F1 metric for reformulator [where the reinforcement learning objective is based on a non-binary evaluation metric]). 
Regarding Claim 9, 
	Buck et al. in view of Wang et al. teaches the method of claim 8.
	Buck et al. further teaches wherein the non-binary evaluation metric corresponds to an F1 score (p. 4, section 2.2, paragraph 2 “In the downward pass the reformulator, the box on the left in Figure 1, maps the original question into one or many alternative questions that will probe the environment for candidate answers. The reformulator is trained end-to-end, using an answer quality objective, for which we use the standard F1 metric. Due to the non-differentiable nature of this sequence-level loss, the model is trained using reinforcement learning …” and p.8, Table 1 
    PNG
    media_image7.png
    673
    1081
    media_image7.png
    Greyscale

teaches the use of non-binary F1 metric for reformulator [wherein the non-binary evaluation metric corresponds to an F1 score]). 
Regarding Claim 11,
	Buck et al. in view of Wang et al. teaches the method of claim 8.
	Buck et al. further teaches wherein updating the plurality of model parameters includes approximating a gradient associated with the reinforcement learning objective using a Monte Carlo sample (p. 6, section 3.3, paragraphs 1-2  
    PNG
    media_image8.png
    970
    1335
    media_image8.png
    Greyscale
teaches model parameters being used to define reformulation process under a policy, wherein the policy yields an unbiased estimator of the reward using Monte Carlo sampling techniques [wherein updating the plurality of model parameters includes approximating a gradient associated with the reinforcement learning objective using a Monte Carlo sample]).
Regarding Claim 13,
	Buck et al. teaches a system for dual sequence inference ((p. 4, section 2.2, paragraph 3 “ … we use a competitive neural answering model, BiDirectional Attention Flow (BiDAF).  BiDAF is an extractive QA system.  It takes as input a question and a document and returns as answer a continuous span from the passage…” teaches document [first sequence], question [second sequence], and answer span [inference] within a QA system).
Buck et al. does not appear to explicitly teach … a first input stage configured to generate a first input representation based on a first sequence; a second input stage configured to generate a first input representation based on a second sequence; an encoder stage configured to generate a codependent representation based on the first input representation and the second input representation, the encoder stage comprising: a plurality of coattention layers arranged sequentially, each of the plurality of coattention layers receiving a pair of layer input representations and generating a pair of summary representations, the pair of layer input representations corresponds to the pair of summary representations generated a preceding layer among the plurality of coattention layers or, when the coattention layer is first among the plurality of coattention layers, to the first input representation and the second input representation; and an output layer configured to receive at least one of the pair of summary representations generated by a last layer among the plurality of coattention layers and generate the codependent representation of the first sequence; and a decoder stage configured to generate an inference based on the codependent representation.
	Wang et al. teaches … a first input stage configured to generate a first input representation based on a first sequence; a second input stage configured to generate a first input representation based on a second sequence; an encoder stage configured to generate a codependent representation based on the first input representation and the second input p. 190, section 3.1, paragraph 1 
    PNG
    media_image3.png
    760
    681
    media_image3.png
    Greyscale

teaches passage representation dependent upon word-level embeddings and character level embeddings of document [first input representation of first sequence] along with  word-level embeddings and character level embeddings of question [second input representation of second sequence];
p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
  and p. 192, section 3.4, paragraph 1 
    PNG
    media_image2.png
    726
    686
    media_image2.png
    Greyscale
teaches generating                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    T
                                
                            
                        
                    [generating a codependent representation] based on passage representation                         
                            
                                
                                    {
                                    
                                        
                                            h
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    ; 
p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
   p. 190, section 3.1, paragraph 1 
    PNG
    media_image3.png
    760
    681
    media_image3.png
    Greyscale

teaches passage representation                         
                            
                                
                                    {
                                    
                                        
                                            h
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     dependent upon word-level embeddings and character level embeddings of document [first input representation of first sequence] and word-level embeddings and character level embeddings of question [second input representation of second sequence], which in turn is used to generate                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                      and                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                    , which are inputs to the Question and Passage Matching layer in Figure 1)),
p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

	teaches Question and Passage GRU layer, Question and Passage Matching Layer, and Passage Self-Matching Layer  arranged in order from bottom to top [wherein the encoder comprises: a plurality of coattention layers arranged sequentially],
teaches the Question and Passage Matching Layer [coattention layer] and Passage Self-Matching Layer [coattention layer] both receiving two inputs [a pair of layer input representations] and generating one output [one or more summary representations], and
                        
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                     [summary representation generated by a preceding layer] and all words in the passage                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                     [summary representation generated by a preceding layer];
p. 192, section 3.3, paragraph 1 
    PNG
    media_image4.png
    457
    685
    media_image4.png
    Greyscale

teaches the pair of layer input representations to the Passage Self-Matching Layer being the passage representation                         
                            
                                
                                    h
                                
                                
                                    t
                                
                                
                                    P
                                
                            
                        
                     [summary representation generated by a preceding layer] and attention pooling vector                         
                            
                                
                                    c
                                
                                
                                    t
                                
                            
                        
                     of the whole passage [summary representation generated by a preceding layer],
p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

p. 190, section 3.1, paragraph 1 
    PNG
    media_image3.png
    760
    681
    media_image3.png
    Greyscale

teaches the pair of layer input representations to the Question and Passage GRU Layer [first among the plurality of coattention layers] being the word-level embeddings/character level-embeddings of the words in the passage [first input representation] and word-level embeddings/character level-embeddings of the words in the question [second input representation]); and 
p. 192, section 3.4, paragraph 1 
    PNG
    media_image2.png
    726
    686
    media_image2.png
    Greyscale
 teaches receiving                         
                            
                                
                                    h
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    a
                                
                            
                        
                     [receiving the one or more summary representations] and generating                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    t
                                
                            
                        
                     [generating the codependent representation]), and 
a decoder stage configured to generate an inference based on the codependent representation (p. 199, Figure 1 and p. 192, section 3.4, paragraph 1, “We … use pointer networks to predict the start and end position of the answer…” teaches predicting the start and end positions of the answer [generating an inference based on the codependent representation] using a pointing network [a decoder stage]).
Buck et al. and Wang et al. are combinable for the same rationale as set forth above with respect to claim 1. 
Regarding Claim 14,
	Buck et al. in view of Wang et al. teaches the system of claim 13. 
	Buck et al. does not appear to explicitly teach wherein the encoder includes at least one residual connection that bypasses one or more of the plurality of coattention layers.
Wang et al. teaches wherein the encoder includes at least one residual connection that bypasses one or more of the plurality of coattention layers (p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale


Buck et al. and Wang et al. are combinable for the same rationale as set forth above with respect to claim 1. 
Claims 4-6, 12, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Buck et al. (“Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”) in view of Wang et al. (“Gated Self-Matching Networks for Reading Comprehension and Question Answering”) and in further view of Lu et al. (“Hierarchical Co-Attention for Visual Question Answering”).
Regarding Claim 4,
	Buck et al. in view of Wang et al. teaches the method of claim 1. 
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines a set of affinity scores corresponding to each pair of items in the pair of layer input representations.
Lu et al. teaches wherein each of the plurality of coattention layers determines a set of affinity scores corresponding to each pair of items in the pair of layer input representations (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
teaches computing of affinity matrix [determining the set of affinity scores] at each level in the hierarchy [each of the plurality of coattention layers] based on similarity between image and question features at all pairs of image-locations and question locations [each pair of items in the pair of layer input representations]).
Buck et al., Wang et al., and Lu et al. are considered analogous art because they are directed to using machine learning techniques to improve methods of answering questions.  
In view of the teachings of Buck et al. in view of Wang et al. it would have been obvious for a person of ordinary skill in the art to apply the teachings of Lu et al. at the time the application was filed in order to use coattention mechanism to improve a machine’s ability to answer questions about an image  (cf. Lu et al., p. 1, section 1, paragraph 3, “We propose a novel mechanism that jointly reasons about visual attention Buck et al. discloses this as a necessary activity for the taught invention (cf. Buck et al., p. 2, Section 1, paragraph 2,   “… Inspired by the human ability to produce and ask the right questions, we present an agent that learns to carry out this process for the user. The agent sits between the user and a backend QA system that we refer to as the ‘environment’. The agent aims to maximize the chance of getting the correct answer by reformulating and reissuing a user’s question to the environment. By asking many question variants, and aggregating the returned evidences, the agent comes up with the single best answer.”).
Regarding Claim 5,
Buck et al. in view of Wang et al. and in further view of Lu et al. teaches the method of claim 4.
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines the one or more summary representations based on the pair of layer input representations and the set of affinity scores.
Lu et al. teaches wherein each of the plurality of coattention layers determines the one or more summary representations based on the pair of layer input representations and the set of affinity scores (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches development of image maps                         
                            
                                
                                    H
                                
                                
                                    v
                                
                            
                        
                    and question maps                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                             
                        
                    [one or more summary representations] at each level in the hierarchy [each of the plurality of coattention layers] based on image feature map V and question representation Q [the pair or layer input representations] and the matrix C [the set of affinity scores]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 6,
Buck et al. in view of Wang et al. and in further view of Lu et al. teaches the method of claim 5.
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines one or more context representations based on the set of the affinity scores and the one or more summary representations.
Lu et al. teaches wherein each of the plurality of coattention layers determines one or more context representations based on the set of the affinity scores and the one or more summary representations (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches development of image and question attention vectors [one or more context representations] based on attention probabilities                         
                            
                                
                                    a
                                
                                
                                    v
                                
                            
                        
                     and                         
                            
                                
                                    a
                                
                                
                                    q
                                
                            
                        
                    , which are determined using the weight parameters image maps                         
                            
                                
                                    H
                                
                                
                                    v
                                
                            
                        
                    and question maps                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                             
                        
                    [one or more summary representations] and the matrix C [the set of affinity scores]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 12,
	Buck et al. in view of Wang et al. teaches the method of claim 1.
	Buck et al. in view of Wang et al. does not appear to explicitly teach wherein the output layer includes a bidirectional long short-term memory layer.
	Lu et al. teaches wherein the output layer includes a bidirectional long short-term memory layer (p. 5, Figure 3 
    PNG
    media_image10.png
    547
    1159
    media_image10.png
    Greyscale
and p. 3, section 2, paragraph 3, “ … The model first encodes the document and the query using separate bidirectional single layer LSTMs, and then uses the outputs as cues for attention … “ teaches output layer of encoder using bidirectional single layer LSTMs [wherein the output layer includes a bidirectional long short-term memory layer]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 15,
Buck et al. in view of Wang et al. teaches the system of claim 13. 
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines a set of affinity scores corresponding to each pair of items in the pair of layer input representations.
Lu et al. teaches wherein each of the plurality of coattention layers determines a set of affinity scores corresponding to each pair of items in the pair of layer input representations (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
teaches computing of affinity matrix [determining the set of affinity scores] at each level in the hierarchy [each of the plurality of coattention layers] based on similarity between image and question features at all pairs of image-locations and question locations [each pair of items in the pair of layer input representations]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 16,
Buck et al. in view of Wang et al. and in further view of Lu et al. teaches the system of claim 15.
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines the one or more summary representations based on the pair of layer input representations and the set of affinity scores.
Lu et al. teaches wherein each of the plurality of coattention layers determines the one or more summary representations based on the pair of layer input representations and the set of affinity scores (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
                              
                            
                                
                                    H
                                
                                
                                    v
                                
                            
                        
                    and question maps                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                             
                        
                    [one or more summary representations] at each level in the hierarchy [each of the plurality of coattention layers] based on image feature map V and question representation Q [the pair or layer input representations] and the matrix C [the set of affinity scores]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 17,
Buck et al. in view of Wang et al. and in further view of Lu et al. teaches the system of claim 16.
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein each of the plurality of coattention layers determines one or more context representations based on the set of the affinity scores and the one or more summary representations.
Lu et al. teaches wherein each of the plurality of coattention layers determines one or more context representations based on the set of the affinity scores and the one or more summary representations (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches development of image and question attention vectors [one or more context representations] based on attention probabilities                         
                            
                                
                                    a
                                
                                
                                    v
                                
                            
                        
                     and                         
                            
                                
                                    a
                                
                                
                                    q
                                
                            
                        
                    , which are determined using the weight parameters image maps                         
                            
                                
                                    H
                                
                                
                                    v
                                
                            
                        
                    and question maps                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                             
                        
                    [one or more summary representations] and the matrix C [the set of affinity scores]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Regarding Claim 18,
	Buck et al. in view of Wang et al. teaches the system of claim 13.
	Buck et al. in view of Wang et al. does not appear to explicitly teach wherein the output layer includes a bidirectional long short-term memory layer.
Lu et al. teaches wherein the output layer includes a bidirectional long short-term memory layer (p. 5, Figure 3 
    PNG
    media_image10.png
    547
    1159
    media_image10.png
    Greyscale
and p. 3, section 2, paragraph 3, “ … The model first encodes the document and the query using separate bidirectional single layer LSTMs, and then uses the outputs as cues for attention … “ teaches output layer of encoder using bidirectional single layer LSTMs [wherein the output layer includes a bidirectional long short-term memory layer]).
Buck et al., Wang et al., and Lu et al. are combinable for the same rationale as set forth above with respect to claim 4. 
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Buck et al. (“Ask the Right Questions: Active Question Reformulation with Reinforcement Learning”) in view of Wang et al. (“Gated Self-Matching Networks for Reading Comprehension and Question Answering”) and in further view of Weston (“Dialog-based Language Learning”).
Regarding Claim 10,
	Buck et al. in view of Wang et al. teaches the method of claim 8.
Buck et al. in view of Wang et al. does not appear to explicitly teach wherein the supervised learning objective is based on a cross-entropy loss function.
	Weston teaches wherein the supervised learning objective is based on a cross-entropy loss function (p. 6, section 4, paragraph 4  
    PNG
    media_image11.png
    243
    1200
    media_image11.png
    Greyscale
  teaches supervised learning objective involving minimization of a cross-entropy loss function [wherein the supervised learning objective is based on a cross-entropy loss function]).
Buck et al., Wang et al. and Weston are considered analogous art because they are directed to using machine learning techniques to improve responses to questions.
In view of the teachings of Buck et al. in view of Wang et al. it would have been obvious for a person of ordinary skill in the art to apply the teachings of Weston at the time the application was filed in order to develop an intelligent dialog agent that can learn to communicate with people as a result of conducting conversations with people (cf. Weston, p. 8, section 6, paragraph 1, “We have presented a set of evaluation datasets and models for dialog-based language learning. The ultimate goal of this line of research is to move towards a learner capable of talking to humans, such that humans are able to effectively teach it during dialog. We believe the dialog-based language learning approach we described is a small step towards that goal.”). The Examiner notes that a person of ordinary skill in the art would find a suggestion to perform this type of analysis since Buck et al. discloses this as a necessary activity for the taught invention (cf. Buck et al., p. 2, Section 1, paragraph 2,   “ … Inspired by the human ability to produce and .
Claims 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Eggebraaten et al. (US 2016/0125291 A1) in view of Lu et al. (“Hierarchical Co-Attention for Visual Question Answering”) and in further view of Wang et al. (“Gated Self-Matching Networks for Reading Comprehension and Question Answering”).
Regarding Claim 19,
	Eggebraaten et al. teaches a non-transitory machine-readable medium having stored thereon a question answering model (paragraph 0067,  “ … host device 222 can include a QA system 230 having a search application 234 and an answer module 232 …” and paragraph 0157, “… the answer sequence model generation module 1134 may be configured to generate an answer sequence model for managing answer sequences … the answer sequence model may be a database or other repository of answer sequences and answer sequence rules.” teaches a QA system capable of storing an answer sequence model;  paragraph 0241, “The present invention may be a system, a method, and/or a computer program product. The computer program product may include a computer readable storage medium (or media) having computer readable program instructions thereon for causing a processor to carry out aspects of the present invention”;  paragraph 0242, “A computer readable storage medium, as used herein, is not to be construed as being transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission media (e.g., light pulses passing through a fiber optic cable), or electrical signals transmitted through a wire” teaches a non-transitory machine-readable storing an answer sequence model within a QA system [a non-transitory machine-readable medium having stored thereon a question answering model]).
Eggebraaten et al. does not appear to explicitly teach a deep coattention encoder of the question answering model comprising: a first encoding sub-layer that receives an input representation of a document and generates a first encoded representation of the document; a second encoding sub-layer that receives an input representation of a question and generates a first encoded representation of the question; a first affinity node that receives the first encoded representation of the document and the first encoded representation of the question and generates a first set of affinity scores; a first summary node that receives the first encoded representation of the question and the first set of affinity scores and generates a first summary representation of the document; a second summary node that receives the first encoded representation of the document and the first set of affinity scores and generates a first summary representation of the question; a third encoding sub-layer that receives the first summary representation of the document and generates a second encoded representation of the document; a fourth encoding sub-layer that receives the first summary representation of the question and generates a second encoded representation of the question; a second affinity node that receives the second encoded representation of the document and the second encoded representation of the question and generates a second set of affinity scores; a third summary node that receives the second encoded representation of the question and the first set of affinity scores and generates a second summary representation of the document; and an output layer that receives the second summary 
Lu et al. teaches a deep coattention encoder of the question answering model (p.2, section 1, paragraph 5, “At the question level, we use recurrent neural networks (RNN) to encode the entire question. For each level of the question representation in this hierarchy, we construct joint question and image co-attention maps, which are then combined recursively to ultimately predict a distribution over the answers” teaches deep coattention encoder of the question answering model) comprising:
… a first affinity node that receives the first encoded representation of the document and the first encoded representation of the question and generates a first set of affinity scores (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches receiving the image feature map V [first encoded representation of the document] and question representation Q [first encoded representation of the question] and generating the affinity matrix C [first set of affinity scores]); 
a first summary node that receives the first encoded representation of the question and the first set of affinity scores and generates a first summary representation of the document (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches receiving question representation Q [first encoded representation of the question] and the affinity matrix C [first set of affinity scores] and generating the image attention map Hv [a first summary representation of the document]);
a second summary node that receives the first encoded representation of the document and the first set of affinity scores and generates a first summary representation of the question (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
      teaches receiving image feature map V [first encoded representation of the document] and the affinity matrix C [first set of affinity scores] and generating the image attention map Hq [a first summary representation of the question]);
… a second affinity node that receives the second encoded representation of the document and the second encoded representation of the question and generates a second set of affinity scores (p. 4, section 3.3.1, paragraph 1 
    PNG
    media_image9.png
    771
    1085
    media_image9.png
    Greyscale
        teaches receiving                         
                            
                                
                                    W
                                
                                
                                    v
                                
                            
                            V
                        
                     in equation 4 for                         
                            
                                
                                    H
                                
                                
                                    v
                                
                            
                        
                      [second encoded representation of the document] and                         
                            
                                
                                    W
                                
                                
                                    q
                                
                            
                            Q
                        
                     in equation 4 for                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                        
                      [second encoded representation of the question] and generating the attention probability                         
                            
                                
                                    a
                                
                                
                                    v
                                
                            
                        
                     of each image region and the attention probability                         
                            
                                
                                    a
                                
                                
                                    q
                                
                            
                        
                     of each word in the question [generating a second set of affinity scores]); and
a third summary node that receives the second encoded representation of the question and the first set of affinity scores and generates a second summary representation of the document ( teaches receiving                         
                            
                                
                                    W
                                
                                
                                    q
                                
                            
                            Q
                        
                     in equation 4 for                         
                            
                                
                                    H
                                
                                
                                    q
                                
                            
                        
                     [second encoded representation of the question] and the affinity matrix C [first set of affinity scores] and generating an image attention vector                         
                            
                                
                                    v
                                
                                ⏞
                            
                        
                     in equation 5 [a second summary representation of the document]).
Eggebraaten et al. and Lu et al. are considered analogous art because they are directed to using machine learning techniques to improve methods of answering questions.  
Eggebraaten et al. it would have been obvious for a person of ordinary skill in the art to apply the teachings of Lu et al. at the time the application was filed in order to use coattention mechanism to improve a machine’s ability to answer questions about an image (cf. Lu et al., p. 1, section 1, paragraph 3, “We propose a novel mechanism that jointly reasons about visual attention and question attention, which we refer to as co-attention. Unlike previous works, which only focus on visual attention, our model has a natural symmetry between the image and question, in the sense that the image representation is used to guide the question attention and the question representation(s) are used to guide image attention.”). The Examiner notes that a person of ordinary skill in the art would find a suggestion to perform this type of analysis since Eggebraaten et al. discloses this as a necessary activity for the taught invention (cf. Eggebraaten et al., paragraph 0027, “Aspects of the present disclosure relate to answer management in a question-answering (QA) environment and, more specifically, to evaluating an answer sequence based on interactions between answers of the answer sequence.”).
Eggebraaten et al. in view of Lu et al. does not appear to explicitly teach a first encoding sub-layer that receives an input representation of a document and generates a first encoded representation of the document; a second encoding sub-layer that receives an input representation of a question and generates a first encoded representation of the question; … a third encoding sub-layer that receives the first summary representation of the document and generates a second encoded representation of the document; a fourth encoding sub-layer that receives the first summary representation of the question and generates a second encoded representation of the question; an output layer that receives the second summary representation of the document and generates a codependent representation; and a dynamic decoder of the 
Wang et al. teaches a first encoding sub-layer that receives an input representation of a document and generates a first encoded representation of the document (p. 190, section 3.1, paragraph 1, “Consider a question                         
                            Q
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                    and a passage                         
                            P
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    . We first convert the words to their respective word-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) and character-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     )” teaches receiving a passage P [input representation of a document] and generating word-level embedding and character-level embedding of passage [a first encoded representation of a document]);
 a second encoding sub-layer that receives an input representation of a question and generates a first encoded representation of the question (p. 190, section 3.1, paragraph 1, “Consider a question                         
                            Q
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                    and a passage                         
                            P
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    . We first convert the words to their respective word-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) and character-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     )” teaches receiving a question Q [input representation of a question] and generating word-level embedding and character-level embedding of question [a first encoded representation of a question]);
… a third encoding sub-layer that receives the first summary representation of the
document and generates a second encoded representation of the document (p. 190, section 3.1, paragraph 1, “Consider a question                         
                            Q
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                    and a passage                         
                            P
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    . We first convert the words to their respective word-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) and character-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) … We then use a bi-directional RNN to produce new representation                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                     and                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                     of all words in the question and passage respectively:                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    Q
                                
                            
                            =
                            
                                
                                    B
                                    i
                                    R
                                    N
                                    N
                                
                                
                                    Q
                                
                            
                            (
                            
                                
                                    u
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    ,
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                
                            
                            )
                        
                     …                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    P
                                
                            
                            =
                            
                                
                                    B
                                    i
                                    R
                                    N
                                    N
                                
                                
                                    P
                                
                            
                            (
                            
                                
                                    u
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    P
                                
                            
                            ,
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    ,
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                
                            
                            )
                        
                    …” teaches receiving word-level embeddings/character-level embeddings of the passage [first summary representation of the document] and generating a new representation of all words in the passage                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    P
                                
                            
                        
                     [a second encoded representation of the document]);
… a fourth encoding sub-layer that receives the first summary representation of the
question and generates a second encoded representation of the question (p. 190, section 3.1, paragraph 1, “Consider a question                         
                            Q
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                    and a passage                         
                            P
                            =
                            
                                
                                    {
                                    
                                        
                                            w
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                    . We first convert the words to their respective word-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) and character-level embeddings (                        
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    m
                                
                            
                        
                     and                         
                            
                                
                                    {
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                     ) … We then use a bi-directional RNN to produce new representation                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    u
                                
                                
                                    m
                                
                                
                                    Q
                                
                            
                        
                     and                         
                            
                                
                                    u
                                
                                
                                    1
                                
                                
                                    P
                                
                            
                            ,
                             
                            …
                            ,
                            
                                
                                    u
                                
                                
                                    n
                                
                                
                                    P
                                
                            
                        
                     of all words in the question and passage respectively:                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    Q
                                
                            
                            =
                            
                                
                                    B
                                    i
                                    R
                                    N
                                    N
                                
                                
                                    Q
                                
                            
                            (
                            
                                
                                    u
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    Q
                                
                            
                            ,
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                    ,
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            Q
                                        
                                    
                                
                            
                            )
                        
                     …                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    P
                                
                            
                            =
                            
                                
                                    B
                                    i
                                    R
                                    N
                                    N
                                
                                
                                    P
                                
                            
                            (
                            
                                
                                    u
                                
                                
                                    t
                                    -
                                    1
                                
                                
                                    P
                                
                            
                            ,
                            
                                
                                    
                                        
                                            e
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    ,
                                    
                                        
                                            c
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                
                            
                            )
                        
                    …” teaches receiving word-level embeddings/character-level embeddings of the question [first summary representation of the question] and generating a new representation of all words in the question                         
                            
                                
                                    u
                                
                                
                                    t
                                
                                
                                    Q
                                
                            
                        
                     [a second encoded representation of the question]);
	an output layer that receives the second summary representation of the document and generates a codependent representation (p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
  and p. 192, section 3.4, paragraph 1 
    PNG
    media_image2.png
    726
    686
    media_image2.png
    Greyscale
teaches generating                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    t
                                
                            
                        
                    [generating a codependent representation] based on passage representation                         
                            
                                
                                    {
                                    
                                        
                                            h
                                        
                                        
                                            t
                                        
                                        
                                            P
                                        
                                    
                                    }
                                
                                
                                    t
                                    =
                                    1
                                
                                
                                    n
                                
                            
                        
                      [second summary representation of the document]); and 
a dynamic decoder of the question answering model that receives the codependent representation and predicts an answer span within the document (p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale
  and p. 192, section 3.4, paragraph 1 teaches predicting start and end position of the answer [answer span within the document] using a pointing network [a dynamic decoder] based upon receiving                         
                            
                                
                                    s
                                
                                
                                    j
                                
                                
                                    t
                                
                            
                        
                    [a codependent representation]).
Eggebraaten et al., Lu et al. and Wang et al. are considered analogous art because they are directed to using machine learning techniques to improve methods of answering questions.  
In view of the teachings of Eggebraaten et al. in view of Lu et al. it would have been obvious for a person of ordinary skill in the art to apply the teachings of Wang et al. at the time the application was filed in order to dynamically enrich each passage representation with information aggregated from both question and passage which in turn leads to better answer predictions (cf. Wang et al., p. 190, section 1, paragraph 4, “By introducing a gating mechanism, our gated attention-based recurrent network assigns different levels of importance to passage parts depending on their relevance to the question, masking out irrelevant passage parts and emphasizing the important ones”). The Examiner notes that a person of ordinary skill in the art Eggebraaten et al. discloses this as a necessary activity for the taught invention (cf. Eggebraaten et al., paragraph 0027, “Aspects of the present disclosure relate to answer management in a question-answering (QA) environment and, more specifically, to evaluating an answer sequence based on interactions between answers of the answer sequence.”).
Regarding Claim 20,
	Eggebraaten et al. in view of Lu et al. and in further view of Wang et al. teaches the non-transitory machine readable medium of claim 19.
	Eggebraaten et al. in view of Lu et al. does not appear to explicitly teach wherein the output layer further receives one or more of the first encoded representation of the document, the first summary representation of the document, or the second encoded representation of the document via one or more residual connections of the deep coattention encoder.
Wang et al. teaches wherein the output layer further receives one or more of the first encoded representation of the document, the first summary representation of the document, or the second encoded representation of the document via one or more residual connections of the p. 199, Figure 1 
    PNG
    media_image1.png
    832
    1406
    media_image1.png
    Greyscale

teaches at least one connection from the question vector to the output layer [at least one residual connection] that bypasses the Question and Passage Matching Layer and the Passage Self-Matching Layer [one or more of the plurality of coattention layers]).
Any limitation that recites “or” has been interpreted as requiring one of the alternatives and not all of the alternatives.
Eggebraaten et al., Liu et al., and Wang et al. are combinable for the same rationale as set forth above with respect to claim 19. 
Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure:  Ferrucci et al. (US 8,880,388 B2) teaches an unsupervised approach to question lexical answer type prediction for use in an open domain QA system.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHIAKA CHUKWUMA OKOROH whose telephone number is (571)272-3710.  The examiner can normally be reached on M - F 7:30 AM - 4:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHIAKA CHUKWUMA OKOROH/Examiner, Art Unit 2125                                            
/MICHAEL J HUNTLEY/Primary Examiner, Art Unit 2116