DETAILED ACTION
This is a response to the Amendment to Application # 16/006,691 filed on January 25, 2022 in which claims 1, 7, 10, 16, and 19 were amended.  

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Status of Claims
Claims 1-20 are pending, of which claims 1-9 are rejected under 35 U.S.C. § 112(b); claims 1, 2, 10, 11, 19, and 20 are rejected under 35 U.S.C. § 102(a)(2); and claims 3-8 and 12-17 are rejected under 35 U.S.C. § 103.

Information Disclosure Statement
The information disclosure statement filed January 25, 2022 fails to comply with the provisions of 37 C.F.R. § 1.97, 1.98 and MPEP § 609 because the non-patent literature submitted does not contain the proper bibliographic information as required by 37 C.F.R. § 1.98(b)(5). Specifically, 37 C.F.R. § 1.98(b)(5) states “[e]ach publication listed in an information disclosure statement must be identified by … [and] relevant pages of the publication.” (Emphasis added). The citation for NPL item 4 include page numbers that do not match those of the provided document, specifically, the IDS provides a citation for “10 Pages” but the included document includes pages 1-17. Thus, the examiner cannot determine what “10 pages” are the intended pages. It has been placed in the application file, but the information referred to therein has not been considered as to the merits. The remainder of the information disclosure statement complies with the provisions of 37 C.F.R. § 1.97, 1.98 and MPEP § 609, 

Claim Interpretation
The following is a quotation of 35 U.S.C. § 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. § 112(f) because the claim limitations uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. Such claim limitations are: the “multi-layer encoder;” the “multi-layer decoder;” the “pointer generator,” the “switch,” the “parallel self-attention encoders,” and the “self-attention decoder” in claims 1-9.
Because these claim limitations are being interpreted under 35 U.S.C. § 112(f), they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If Applicant does not intend to have these limitations interpreted under 35 U.S.C. § 112(f), Applicant may:  (1) amend the claim limitation(s) to avoid them being interpreted under 35 U.S.C. § 112(f) (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitations recite sufficient structure to perform the claimed function so as to avoid them being interpreted under 35 U.S.C. § 112(f).

not being interpreted under 35 U.S.C. § 112(f) because the claim limitations recite sufficient structure, materials, or acts to entirely perform the recited function.  Such claim limitations are: the “parallel bi-directional long short term memories” and the “bi-directional long short term memory” in claims 4 and 6.
Because these claim limitations are not being interpreted under 35 U.S.C. § 112(f), they are not being interpreted to cover only the corresponding structure, material, or acts described in the specification as performing the claimed function, and equivalents thereof.
If Applicant intends to have these limitations interpreted under 35 U.S.C. § 112(f), Applicant may:  (1) amend the claim limitations to remove the structure, materials, or acts that performs the claimed function; or (2) present a sufficient showing that the claim limitations do not recite sufficient structure, materials, or acts to perform the claimed function.

Additionally, the term “anti-curriculum” learning does not appear to be a known term of art. However, this term is defined by the present specification as situations where “the training sample is selected from those task types which are characterized as being more difficult to learn, have longer answer sequences, and/or involve different types of decoding.” (Spec ¶ 87). Should Applicant intend for a different meaning to apply to this term, the examiner recommends amending the claims to better define the intended meaning. 

Claim Objections
Claims 1-18 are objected to for failing to comply with 37 C.F.R. § 1.75(g), which requires “[t]he least restrictive claim should be presented as claim number 1” (emphasis added). See also, MPEP least restrictive claim of the independent claims. This objection will be held in abeyance upon Applicant’s request.

Claims 1, 10, and 19 are objected to because of the following informalities:  the limitation “a multi-layer encoder for encoding first words from a context and second words from a question that is separate from but related to the context in parallel” is awkward and unclear as to which of the previous elements are “separate from but related to the context” and which of the previous elements are “in parallel.”  Appropriate correction is required.

Claims 7 and 16 are objected to because of the following informalities:  these claims include extraneous commas that are grammatically incorrect and make the limitation difficult to read. The examiner recommends amending this limitation to “wherein the system is further trained against a full set of task types[[,]] that the system is designed to process[[,]] after the system is trained against the subset of task types.”  Appropriate correction is required.

Claim Rejections - 35 U.S.C. § 112
The following is a quotation of 35 U.S.C. § 112(b):

(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


Claims 1-9 are rejected under 35 U.S.C. § 112(b) as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor regards as the invention.

claims 1-9, the claim limitation “multi-layer encoder;” “multi-layer decoder;” “pointer generator,” “switch,” “parallel self-attention encoders,” and “self-attention decoder” recited or inherited in these claims invoke 35 U.S.C. § 112(f). However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. When a claim invokes 35 U.S.C. § 112(f) for a computer implemented means-plus-function claim, the specification must disclose the specific algorithm required to transform the general-purpose computing equipment into the required special purpose computer. See MPEP § 2181(II)(B).
While the present specification does recite some “specific algorithm[s] required to transform the general-purpose computing equipment into the required special purpose computer,” Applicant explicitly states that these are “non-limiting example[s].” (Remarks dated January 25, 2022 at 8). Thus, because these algorithms are “non-limiting,” Applicant is reciting a “class” of algorithms and not one specific algorithm.  Encyclopaedia Britannica, Inc. v. Alpine Elecs., Inc., 355 Fed. App'x 389, 394-95 (Fed. Cir. 2009) (holding that disclosure of a class of algorithms for performing the claimed functions is not sufficient).
Therefore, the claim is indefinite and is rejected under 35 U.S.C. § 112(b).
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. § 112(f); 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. § 132(a)); or 

If Applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. § 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 C.F.R. § 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claim Rejections - 35 U.S.C. § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. §§ 102 and 103 (or as subject to pre-AIA  35 U.S.C. §§ 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of the appropriate paragraphs of 35 U.S.C. § 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –



Claims 1, 2, 10, 11, 19, and 20 are rejected under 35 U.S.C. § 102(a)(2) as being anticipated by Lao et al., US Publication 2018/0114108 (hereinafter Lao).

Regarding claim 1, Lao discloses a system for natural language processing, the system comprising “a multi-layer encoder for encoding first words from a context and second words from a question that is separate from but related to the context in parallel” (Lao ¶¶ 3, 22, 25, 48, and 75). Specifically, Lao discloses that input passages are encoded (Lao ¶ 25) where the input passages may be encoded in parallel (Lao ¶ 75) and the encoder may contain multiple layers (Lao ¶ 3). Lao goes on to disclose that multiple input passages (i.e., a first word and a second word) may be received (Lao ¶ 48) and that they may be received from a source document (i.e., separate but related, Lao ¶ 22). Because the passages are from the same document, one is the context for the other. Additionally, the terms “question” and “answer” appear to refer to the query and the query result, respectively, and, therefore, because the answer of Lao is a query, it is a “question” within the scope of the present invention.
Further, Lao discloses “a multi-layer decoder for decoding the encoded context and the encoded question” (Lao ¶¶ 3, 31) by decoding the encoded input sequence using a recurrent neural network (Lao ¶ 31) and indicating that recurrent neural networks may contain multiple layers (Lao ¶ 33). Moreover, Lao discloses “a pointer generator for generating distributions over the first words from the context, the second words from the question, and third words in a vocabulary based on an output from the decoder” (Lao ¶ 29) by generating a distribution for the tokens of the decoded input passage.
Likewise, Lao  discloses “a switch for: generating a weighting of the distribution over the first words from the context, the distribution over the second words from the question, and the distribution by generating a weighted average (i.e., a composite distribution)  that comprises each of the entries weighted by time step (i.e., a weighting). Finally, Lao discloses “selecting words for inclusion in an answer using the composite distribution” (Lao ¶ 40) by selecting the highest scoring token as the answer to the query.

Regarding claim 10, it merely recites the method performed by the system of claim 1. The method comprises executing computer software modules for performing the various functions. Lao comprises computer software modules for performing the same functions. Thus, claim 10 is rejected using the same rationale set forth in the above rejection for claim 1.

Regarding claim 19, it merely recites a non-transitory machine-readable medium for embodying the system of claim 1. The medium comprises computer software modules for performing the various functions. Lao comprises computer software modules for performing the same functions. Thus, claim 19 is rejected using the same rationale set forth in the above rejection for claim 1.

Regarding claim 2, 11, and 20, Lao discloses the limitations contained in parent claim 1, 10, and 19 for the reasons discussed above. In addition, Lao discloses “wherein the context and the question correspond to a natural language processing task type selected from question answering, machine translation, document summarization, database query generation, sentiment analysis, natural language inference, semantic role labeling, relation extraction, goal oriented dialogue, and pronoun resolution” (Lao Abstract) where the task type is question answering.

Claim Rejections - 35 U.S.C. § 103
The following is a quotation of 35 U.S.C. § 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


This application currently names joint inventors. In considering patentability of the claims, the Examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary. Applicants are advised of the obligation under 37 C.F.R. § 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. § 102(b)(2)(C) for any potential 35 U.S.C. § 102(a)(2) prior art against the later invention.

Claims 3, 5, 12, and 14 are rejected under 35 U.S.C. § 103 as being unpatentable over Lao in view of Xiong et al., Dynamic Coattention Networks for Question Answering, Published as a Conference Paper at the International Conference on Learning Representations. Toulon, France. April 24-26, 2017. pp. 1-14, as cited on the Information Disclosure Statement dated February 19, 2019 (hereinafter Xiong).

Regarding claim 3, Lao discloses the limitations contained in parent claim 1 for the reasons discussed above. In addition, Lao does not appear to explicitly disclose “wherein the multi-layer encoder 
However, Xiong discloses a question and answering system “wherein the multi-layer encoder comprises: a coattention network for determining a coattention between the first words in the context and the second words in the question” (Xiong 1) by using a coattention network for determining coattention between the question and the interactions (i.e., context). Additionally, Xiong discloses “parallel bi-directional long short term memories to compress outputs from the coattention layer” (Xiong 3, Fig. 2) by providing a diagram with a series of parallel bi-directional LSTMs.
Lao and Xiong are analogous art because they are from the “same field of endeavor,” namely that of question and answering systems. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Xiong before him or her to modify the attention of Lao to include the coattention layers of Xiong.
The motivation for doing so would have been that the use of coattention system in question and answer systems has been shown to provide a more accurate result than previous methods. (Xiong 1). 

Regarding claims 5 and 14, Lao discloses the limitations contained in parent claims 1 and 10 for the reasons discussed above. In addition, Lao discloses “wherein the multi-layer encoder comprises: parallel encoding layers for encoding the words in the context and words in the question in parallel” (Lao ¶¶ 3, 21, 22, and 75) by encoding an input passage (Lao ¶ 21) that contains a sequence of words (Lao ¶ 22, i.e., a first and second word) that may be encoded in parallel (Lao ¶ 75) and may contain multiple layers (Lao ¶ 3). The present specification uses the term “context” to refer to the words that are around the selected words (Spec. ¶ 20 and Fig. 1), and thus, the sequence of words of Lao includes both context and words from the question. (See also Lao ¶ 34).
Lao does not appear to explicitly disclose “parallel linear networks for projecting the encodings of the words in the context and the words in the question in parallel; and a bidirectional long short term memory for further encoding the projections of the encodings.”
However, Xiong discloses a question and answering system including “parallel linear networks for projecting the encodings of the words in the context and the words in the question in parallel” (Xiong 3, 6) by disclosing that the implementation system includes linear networks and that the operations may be performed in parallel. Additionally, Xiong discloses “a bidirectional long short term memory for further encoding the projections of the encodings” (Xiong 3, Fig. 2) by providing a diagram with a series of parallel bi-directional LSTMs.
Lao and Xiong are analogous art because they are from the “same field of endeavor,” namely that of question and answering systems. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Xiong before him or her to modify the network of Lao to include the linear networks of Xiong.
The motivation for doing so would have been that the use of system described by Xiong in question and answer systems has been shown to provide a more accurate result than previous methods. (Xiong 1). 

Regarding claim 12, Lao discloses the limitations contained in parent claim 10 for the reasons discussed above. In addition, Lao does not appear to explicitly disclose “determining a coattention between the first words in the context and the second words in the question.”
However, Xiong discloses a question and answering system “determining a coattention between the first words in the context and the second words in the question” (Xiong 1) by using a coattention network for determining coattention between the question and the interactions (i.e., context). 
Lao and Xiong are analogous art because they are from the “same field of endeavor,” namely that of question and answering systems. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Xiong before him or her to modify the attention of Lao to include the coattention layers of Xiong.
The motivation for doing so would have been that the use of coattention system in question and answer systems has been shown to provide a more accurate result than previous methods. (Xiong 1). 

Claims 4, 6, 13, and 15 are rejected under 35 U.S.C. § 103 as being unpatentable over Lao in view of Sebastian Ruder; Deep Learning for NLP Best Practices; July 25, 2017; ruder.io; Pages 1-25 (hereinafter Ruder).

Regarding claims 4 and 13, Lao discloses the limitations contained in parent claims 1 and 10 for the reasons discussed above. In addition, Lao does not appear to explicitly disclose “wherein the multi-layer encoder comprises: parallel self-attention encoders for generating an attention across the context and an attention across the question in parallel; and parallel bi-directional long short term memories for 
However, Ruder discloses a neural network “wherein the multi-layer encoder comprises: parallel self-attention encoders for generating an attention across the data …” (Ruder 10-11) by using a self-attention layer. Further, Ruder discloses “parallel bi-directional long short term memories for generating final encodings of the data … based on the generated attention” (Ruder 4-5, 12-13) by indicating that it is well-known to use LSTM on the encoded data and indicating that the LSTM may be bi-directional. 
Further, a person of ordinary skill in the art prior to the effective filing date would have recognized that when Ruder was combined with Lao, the specific data, e.g., the context and the questions, of Lao would be operated on according to the neural network components of Ruder and that the neural network of Ruder would be operated in parallel as taught by Lao. Therefore, the combination of Lao and Ruder at least teaches and/or suggests the claimed limitations “wherein the multi-layer encoder comprises: parallel self-attention encoders for generating an attention across the context and an attention across the question in parallel; and parallel bi-directional long short term memories for generating final encodings of the context and the question in parallel based on the generated attention,” rendering them obvious.
Lao and Ruder are analogous art because they are from the “same field of endeavor,” namely that of neural networks. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Ruder before him or her to modify the neural networks of Lao to include the self-attention layers and bi-directional LSTM memory of Ruder.
The motivation for doing so would have been that these practices are known to be “best practices” within the art. (Ruder 1). 

Regarding claims 6 and 15, Lao discloses the limitations contained in parent claims 1 and 10 for the reasons discussed above. In addition, Lao does not appear to explicitly disclose “wherein the multi-layer decoder comprises: an encoding and embedding layer for encoding and embedding an intermediate version of the answer; a self-attention decoder for generating an attention between the encoded and embedded intermediate version of the answer and a final encoding of the context; a long short term memory for generating an intermediate decoder state from outputs of the self-attention decoder; and a context and question attention network for generating context and question decoder states based on a final encoding of the context, a final encoding of the question, and the intermediate decoder state.”
However, Ruder discloses a neural network “wherein the multi-layer decoder comprises: an encoding and embedding layer for encoding and embedding an intermediate version of the data” (4-5) by indication that there are eight layers, which would mean that the result of each layer, except the final layer, would be an “intermediate” version. Additionally, Ruder discloses “a self-attention decoder for generating an attention between the encoded and embedded intermediate version of the data and a final encoding of the data.” (Ruder 10-11). Further, Ruder discloses “a long short term memory for generating an intermediate decoder state from outputs of the self-attention decoder” (Ruder 4-5, 12-13) by indicating that it is well-known to use LSTM on the encoded data. Finally, Ruder discloses “a data attention network for generating data decoder states based on a final encoding of the data, a final encoding of the data, and the intermediate decoder state” (Ruder 8) by disclosing that the decoder states are generated based on the current position and previous states (i.e., intermediate states).
Further, a person of ordinary skill in the art prior to the effective filing date would have recognized that when Ruder was combined with Lao, the specific data, e.g., the context and the questions, of Lao would be operated on according to the neural network components of Ruder. 
Lao and Ruder are analogous art because they are from the “same field of endeavor,” namely that of neural networks. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Ruder before him or her to modify the neural networks of Lao to include the self-attention layers and bi-directional LSTM memory of Ruder.
The motivation for doing so would have been that these practices are known to be “best practices” within the art. (Ruder 1). 

Claims 7, 8, 16, and 17 are rejected under 35 U.S.C. § 103 as being unpatentable over Lao in view of Bengio, et al., Curriculum learning, In Proceedings of the 26th International Conference on Machine Learning, Montreal, Canada, 2009, 8 pages, (hereinafter Bengio), as cited on the Information Disclosure Statement dated February 19, 2019.

Regarding claims 7 and 16, Lao discloses the limitations contained in parent claims 1 and 10 for the reasons discussed above. In addition, Lao does not appear to explicitly disclose “wherein the system is trained against a subset of task types, wherein the system is further trained against a full set of task 
However, Bengio discloses a machine learning system including the requirement “wherein the system is further trained against a full set of task types, that the system is designed to process, after the system is trained against the subset of task types” (Bengio 1) by using a curriculum learning strategy. A person of ordinary skill in the art would understand a “curriculum” strategy to be a strategy that begins training with a small training set and then increase the difficult of the training set in size and complexity until the system has been trained on everything. (Bengio, § 1 Introduction).
Lao and Bengio are analogous art because they are from the “same field of endeavor,” namely that of machine learning systems. 
Prior to the effective filing date of the claimed invention, it would have been obvious to one of ordinary skill in the art, having the teachings of Lao and Bengio before him or her to modify the training of Lao to include the curriculum learning of Bengio.
The motivation for doing so would have been that curriculum learning improves the speed and quality of the training process.  

Regarding claims 8 and 17, the combination of Lao and Bengio discloses the limitations contained in parent claims 7 and 16 for the reasons discussed above. In addition, the combination of Lao and Bengio discloses “wherein the subset of task types are selected according to a curriculum strategy.” (Bengio 1).

Allowable Subject Matter
18 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Response to Arguments
Applicant’s arguments filed January 25, 2022, with respect to the objection to the drawings; the objection to claim 16; and the rejection to claims 7 and 16 under 35 U.S.C. § 112(b) (Remarks 7-8) have been fully considered and are persuasive. The objection to the drawings; the objection to claim 16; and the rejection to claims 7 and 16 under 35 U.S.C. § 112(b) have been withdrawn. 

Applicant's remaining arguments filed January 25, 2022, have been fully considered but they are not persuasive.

Regarding the rejection of claims 1-9 under 35 U.S.C. § 112(b) for failing to provide the requisite support for limitation invoking 35 U.S.C. § 112(f), Applicant provides citations for algorithms within the present specification as support for these elements and requests the rejection be withdrawn. (Remarks 8).

Applicant’s provided support does not amount to the requisite algorithms for several reasons. Many of these citations are to elements of a drawing that simply amount to a “black box.” Blackboard, Inc. v. Desire2Learn, Inc., 574 F.3d 1371, 1383-85, 91 USPQ2d 1481, 1491-93 (Fed. Cir. 2009); Net MoneyIN, Inc. v. VeriSign, Inc., 545 F.3d 1359, 1366-67, 88 USPQ2d 1751, 1756-57 (Fed. Cir. 2008); Ex parte Rodriguez, 92 USPQ2d 1395, 1405-06 (Bd. Pat. App. & Int. 2009) (precedential). 

Finally, applicant states that each of these citations are merely “non-limiting examples.” (Remarks 8). When a claim invokes 35 U.S.C. § 112(f), the broadest reasonable interpretation of that limitation is the structure recited in the specification. See MPEP § 2181. Thus, the entire purpose of 35 U.S.C. § 112(f) is to limit the claims to the structure disclosed in the specification. Therefore, if the structure is “non-limiting,” the claim limitation cannot be read to be limited to that structure, meaning that that cited structure cannot provide the requisite support, and it is the same as if Applicant had no structure at all. Encyclopaedia Britannica, Inc. v. Alpine Elecs., Inc., 355 Fed. App'x 389, 394-95 (Fed. Cir. 2009) (holding that implicit or inherent disclosure of a class of algorithms for performing the claimed functions is not sufficient, and the purported "one-step" algorithm is not an algorithm at all).
Therefore, the rejection of claims 1-9 under 35 U.S.C. § 112(b) is maintained. 

Regarding the rejection of claim 1 under 35 U.S.C. § 102(a)(2), Applicant first argues that Lao fails to disclose the newly amended limitation “a multi-layer encoder for encoding first words from a context and second words from a question that is separate from but related to the context in parallel” because “Lao discusses encoding words from a passage, but not a question separate from but related to the passage.” (Remarks 9). The examiner disagrees for the reasons discussed in the updated rejection of claim 1 above.



Lao discloses that a token score is generated for the string “token by token.” (Lao ¶ 29). Lao further discloses that these tokens are generated from the passages (Lao ¶ 22), which are the first words from the context and the second words from the questions for the reasons discussed in the rejection of claim 1 above and, thus generates multiple tokens. Likewise, Lao discloses that each of these scores is a “distribution over tokens in a vocabulary.” (Lao ¶29). Therefore, for each of these tokens, be it a first word token or a second word token, is a distribution of that token over tokens in a vocabulary that is generated in the form of a token score. Because there are multiple tokens, some of which are first word tokens and some of which are second word tokens, multiple scores are being generated with each score being over both the words in the vocabulary and over one of the first or second words. 
The examiner notes that this limitation does not requires three separate distributions with a first distribution being over only first words from the context; a second distribution being over only second words from the question; and a third distribution over only third words in a vocabulary. Instead it merely requires at least two distributions where the at least two distributions include all three word groups in any shape or fashion.
Therefore, Applicant’s argument is unpersuasive. 



Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 C.F.R. § 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 C.F.R. § 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ANDREW R DYER whose telephone number is (571)270-3790.  The examiner can normally be reached on Monday-Friday 7:30-3:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kavita Stanley can be reached on 571-272-8352.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/ANDREW R DYER/Primary Examiner, Art Unit 2176