DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
The present application was filed on 06/30/2017.
This action is in response to amendments and/or arguments filed on 10/18/2021. In the current amendments, claims 1, 11 and 18 have been amended and claims 2, 5, 15 and 20 have been cancelled. Claims 1, 3-4, 6-14 and 16-19 are pending and have been examined. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/18/2021 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) Claims 1, 3-4, 6-14 and 16-19 have been considered but are moot because the new ground of rejection does not rely 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1, 3-4, 6-14 and 16-19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claim 1, the specification teaches iteratively analyzing data representation and termination state occurs based on a current state of data representation. “For instance, when the variable is 0, iterative analysis of the content continues. However, when the variable is 1, iterative analysis of the content stops and a current state of the internal state 122 is output as an answer to the query” para [0063].  The following limitation lacks written description support “such that portions of the text-based content are not analyzed responsive to the state of the variablePage 2 of 13Application No. 15/639,705Docket No. 400462-US-NP changing prior to analysis of all of the text-based content;”
Dependent claims 3-4 and 6-10 are rejected for dependency of independent claim 1. 
Regarding claim 11, the specification teaches iteratively analyzing data representation and termination state occurs based on a current state of data representation. “For instance, when the variable is 0, iterative analysis of the content continues. However, when the variable is 1, iterative analysis of the content stops and a current state of the internal state 122 is output as an answer to the query” para [0063].  The following limitation lacks written description support “such that portions of the text-based content are not analyzed responsive to the state of the variablePage 2 of 13Application No. 15/639,705Docket No. 400462-US-NP changing prior to analysis of all of the text-based content;”
Dependent claims 12-14 and 16-17 are rejected for dependency of independent claim 11. 
Regarding claim 18, the specification teaches iteratively analyzing data representation and termination state occurs based on a current state of data representation. “For instance, when the variable is 0, iterative analysis of the content continues. However, when the variable is 1, iterative analysis of the content stops and a current state of the internal state 122 is output as an answer to the query” para [0063].  The following limitation lacks written description support “in response to ascertaining that a termination state has occurred, terminating analysis of the text-based content such that portions of the text-based content are not analyzed responsive to the state of the variable changing prior to analysis of all text- based questions”
Dependent claims 19 is rejected for dependency of independent claim 18. 


Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 

(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
Claim 9
“query module configured to receive the query as text input…”
Upon review of the specification, it is identified that the above generic place holder has sufficient structure as shown in the following paragraphs: 
[0015] “…The computing device 102 may be configured in a variety of ways, such as a desktop personal computer, a server, a laptop, a wireless cellular phone (e.g., a smartphone), a tablet, and so forth. One example implementation of the computing device 102 is presented below as the computing device 602 of FIG. 6”

[0016] “The computing device 102 includes and/or has access to content 104 and an analysis module 106. Generally, the content 104 represents a discrete collection and/or collections of text-based content, such as electronic documents, web pages, database entries, and so forth.”

 includes a query module 112, an output module 114”

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-4, 6, 8-14 and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Hermann et al. (“Teaching Machines to Read and Comprehend”, hereinafter: Hermann) in view of Wang et al. (“CNN-RNN: A Unified Framework for Multi-label Image Classification”, hereinafter: Wang) and further in view of Romera-Paredes (“Recurrent Instance Segmentation”, hereinafter: Romera-Paredes) and further in view of Perez-Ortiz et al. (“Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”, hereinafter: Perez).
Regarding claim 1 (Currently Amended) 
Hermann teaches a system comprising: at least one processor; and one or more computer-readable storage media storing a neural network that is executable by the at least one processor to implement functionality including: (Although Hermann doesn’t explicitly teaches processor, the examiner takes an official notice that it is well-known in the art for deep neural network to be implemented on a computer with a processor. See abstract “This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.”)
an internal state (Examiner notes LSTM memory contains information which corresponds to internal state see pg. 5 first paragraph “We employ a Deep LSTM cell with skip connections from each input x(t) to every hidden layer… where || indicates vector concatenation h(t, k) is the hidden state for layer k at time t, and i, f, o are the input, forget, and output gates respectively”) to represent a state of a query for information about text-based content, (Figure 1 shows “Document and query embedding models” and the text-based content is “Mary went to England x visited England”)
the internal state evolving as different portions of the text-based content are analyzed; (Figure 1: Document and query embedding models and Figure 1a)-c) show LSTM memory[corresponds to internal state] with information evolving different portions of text-based content “marry went to England” Examiner notes that at each state s(1)y(1), S(2)y(2), S(3)y(3)… etc the internal state LSTM memory reads each query thus mean revolves at different portions of the text as evidence by FIG. 1 where under S(1)y(1) the text is “Mary” and under S(2)y(2) the text is “went”… etc)
an attention vector to be applied to different portions of the text-based content to cause the internal state to iteratively evolve from an initial state denoting a first vector representation of the state of the query through a plurality of subsequent states as the (Examiner notes that at each state S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4), the internal state LSTM memory reads each query thus mean revolves at different portions of the text as evidence by FIG. 1 where under initial state S(1)y(1) the text is “Mary” and under S(2)y(2) the text is “went”… etc) the plurality of subsequent states corresponds to S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4). Also see pg. 5 first paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”)
Hermann does not teach and a termination gate to: maintain a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…

and a termination gate to evaluate the internal state and to terminate analysis of the text-based content in response to occurrence of a termination criterion, 
such that portions of the text-based content are not analyzed responsive to the state of the variable Docket No. 400462-US-NPchanging prior to analysis of all of the text-based content; and to cause the internal state to be output as an answer to the query in response to termination of the analysis;

Wang teaches and a termination gate to: (Examiner notes the termination gate corresponds to output node of the RNN see pg. 2286 “The RNN framework computes the probability of a multilabel prediction sequentially as an ordered prediction path, where the a priori probability of a label at each time step can be computed based on the image embedding and the output of the recurrent neurons.”) to evaluate the internal state (pg. section 3.1 “LSTM extends RNN by adding three gates to an RNN neuron: a forget gate f to control whether to forget the current state; an input gate i to indicate if it should read the input; an output gate o to control whether to output the state. These gates enable LSTM to learn long-term dependency in a sequence, and make it is easier to optimize, because these gates help the input signal to effectively propagate through the recurrent hidden states r(t) without affecting the output”)
and to terminate analysis of the text-based content in response to an occurrence of a termination criterion, (pg. 2289 second paragraph “The N prediction paths with highest probability among these paths constitute the intermediate paths for time step t + 1. The prediction paths ending with the END sign are added to the candidate path set C. The termination condition of the beam search is that the probability of the current intermediate paths is smaller than that of all the candidate paths”)
 (pg. 2288 left col “The recurrent layer takes the label embedding of the previously predicted label, and models the co-occurrence dependencies in its hidden recurrent states by learning nonlinear functions… where r(t) and o(t) are the hidden states and outputs of the recurrent layer at the time step t, respectively, wk(t) is the label embedding of the t-th label in the prediction path… The output of the recurrent layer and the image representation are projected into the same low-dimensional space as the label embedding.”)
in response to termination of the analysis; (pg. 2289 second paragraph “The N prediction paths with highest probability among these paths constitute the intermediate paths for time step t + 1. The prediction paths ending with the END sign are added to the candidate path set C. The termination condition of the beam search is that the probability of the current intermediate paths is smaller than that of all the candidate paths”)
Hermann and Wang are analogous art because they are both directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hermann to incorporate the teaching of Wang to include termination condition that is capable of outputting results after a termination criteria is reached as disclosed by Wang (Pg. 2289 second paragraph).
Hermann in view of Wang does not teach maintain a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…
such that portions of the text-based content are not analyzed responsive to the state of the variable Docket No. 400462-US-NPchanging prior to analysis of all of the text-based content;
and in response to an Page 2 of 15Application No. 15/639,705a termination criterion having a false value, generate an updated attention vector, and feed the updated attention vector into a state network to update a next internal state.
Romera-Paredes teaches and in response to an Page 2 of 15Application No. 15/639,705a termination criterion having a false value, generate an updated attention vector, and feed the updated attention vector into a state network to update a next internal state. (Examiner notes that Romera-Paredes teaches the whole system iteratively does its training which include all the attention vector whenever the criterion is not met and predicts until confidence score gets below 0.5 see pg. 10 fifth paragraph “Whenever that is the case, we assign the pixel to the instance belonging to the earlier iteration. Finally, the produced sequence terminates whenever the confidence score predicted by the network is below 0.5.” Also see pg. 5 “At the beginning of the sequence, the initial inner state of the RNN, h0, is initialized to 0. After the first iteration, the RNN produces the segmentation of one of the instances in the image (any of them), together with an indicator that informs about the confidence of the prediction in order to have a stopping condition. Simultaneously, the RNN updates the inner state, h1, to account for the recent segmented instance. Then, having again as inputs B, and as inner state h1, the model outputs another segmented instance and its confidence score. This process keeps iterating until the confidence score drops below a certain level in which the model stops, ideally having segmented all instances in the image.”)
Hermann, Wang and Romera-Paredes are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hermann in view of Wang to incorporate the teaching of Romera-Paredes to include a new instance segmentation paradigm consisting in an end-to-end method that learns how to segment instances sequentially.
One of ordinary skill in the art would have been motivated to make this modification in order to improve the problem within computer vision of “the automatic delineation of different objects appearing in an image” for the purpose of enhancing the outline of objects for partially sighted within computer vision as disclosed by Romera-Paredes (pg. 1 first and second paragraph “Instance segmentation, the automatic delineation of different objects appearing in an image, is a problem within computer vision that has attracted a fair amount of attention. Such interest is motivated by both its potential applicability to a whole range of scenarios, and the stimulating technical challenges it poses. Regarding the former, segmenting at the instance level is useful for many tasks, ranging from allowing robots to segment a particular object in order to grasp it, to highlighting and enhancing the outline of objects for the partially sighted, wearing “smart specs”. Counting elements in an image has interest in its own right [1] as it has a wide range of applications.”).
Hermann in view of Wang with Romera-Paredes does not teach maintain a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…
such that portions of the text-based content are not analyzed responsive to the state of the variable Docket No. 400462-US-NPchanging prior to analysis of all of the text-based content.
Perez teaches maintain a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; (Examiner notes that output gate has change in state because it does the outputting Fig. 1. A LSTM memory block with one cell. If the memory block had two or more cells, the three gates (below; from left to right: input gate, forget gate, and output gate) would be shared by all of them.” Also see pg. 242 right col second paragraph “When representing the weights, superscripts indicate the computation in which the weight is involved: the w; c in… indicates that the weight is used to compute the activation of a forget gate (w) from a cell (c ); the ‘out’ in Woutj (a bias) indicates that it is used to compute an output gate.”)
…
such that portions of the text-based content are not analyzed responsive to the state of the variable Docket No. 400462-US-NPchanging prior to analysis of all of the text-based content. (Examiner notes that Perez teaches updating gate variable at the end of each iteration which corresponds to termination gate and every time a gate variable is updated the termination has occurred the state does not change because there is a termination criterion occurred before computing the output see pg. 244 right col “We consider a group of weights for each neuron in LSTM, that is, a group for each different gate, cell and output neuron, giving g=nM(nC+3)+ nY. At time step t we calculate the derivatives required for matrix Ci(t) as indicated in Section 2.2, and then apply Eqs. (16)–(18) in order to update weights wi(t).”)
Hermann, Wang, Romera-Paredes and Perez are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hermann in view of Wang with Romera-Paredes to incorporate the teaching of Perez to include Kalman filters to improve LSTM network performance. 
One of ordinary skill in the art would have been motivated to make this modification in order to improve LSTM using Kalman filter training algorithm which “allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm” as disclosed by Perez (abstract “The long short-term memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm. In this paper we present a set of experiments which are unsolvable by classical recurrent networks but which are solved elegantly and robustly and quickly by LSTM combined with Kalman filters.”).

Regarding claim 11 (Currently Amended)
Claim 11 recites analogous limitations to independent claim 1 and therefore is rejected on the same ground as independent claim 1. 

Regarding claim 3
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann further teaches wherein the initial state comprising a first word and a last word of the query. (Examiner notes LSTM memory that contains information is shown in Figure 1: Document and query embedding models and Figure 1a)-c) show LSTM memory with information evolving different portions of text-based content and showing the first word “marry” and the last word “England” also see pg. 5 third paragraph “We denote the outputs of the forward and backward LSTMs as y (t) and y (t) respectively. The encoding u of a query of length |q| is formed by the concatenation of the final forward and backward outputs,”)
Regarding claim 14
	Claim 14 is a method claim corresponding to system claim 3 and is rejected for the same reasons as given in the rejection of that claim. 

Regarding claim 4
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann further teaches wherein the attention vector is generated at least in part based on a current state of the internal state. (pg. 5 second paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism[corresponds to attention vector] inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs… where we are interpreting yd as a matrix with each column being the composite representation yd(t) of document token t. The variable s(t) is the normalised attention at token t. Given this attention score the embedding of the document r is computed as the weighted sum of the token embeddings.”)

Regarding claim 6
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Wang further teaches wherein the termination criterion is based on a current state of the internal state. (Pg. 2289 second paragraph “The N prediction paths with highest probability among these paths constitute the intermediate paths for time step t + 1. The prediction paths ending with the END sign are added to the candidate path set C. The termination condition of the beam search is that the probability of the current intermediate paths is smaller than that of all the candidate paths”)
Hermann, Romera-Paredes, Perez and Wang are analogous art because they are all directed to LSTM neural network. 
Hermann in view of Romera-Paredes, Perez to incorporate the teaching of Wang to include termination condition that is capable of outputting results after a termination criteria is reached as disclosed by Wang (Pg. 2289 second paragraph).

Regarding claim 8 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann teaches the system further comprising a memory that is external to the neural network and that stores vector representations of words of the text-based content, (Examiner notes the LSTM memory is an external memory to the neural network see pg. 5 second paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”)
the attention vector (pg. 5 second paragraph “This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”) being iteratively applied to the vector representations of words of the text-based content to cause the internal state to evolve. (Examiner notes that at each state S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4), the internal state LSTM memory reads each query thus mean revolves at different portions of the text as evidence by FIG. 1 where under initial state S(1)y(1) the text is “Mary” and under S(2)y(2) the text is “went”… etc) the plurality of subsequent states corresponds to S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4). Also see pg. 5 first paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”)
Regarding claim 9 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann further teaches the system further comprising a query module configured (Although Hermann doesn’t explicitly teaches processor, the examiner takes official notice that it is well-known in the art for deep neural network to be implemented on a computer with a processor. See abstract “This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.”)
to receive the query as text input, (pg. 5 second paragraph “This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”)
the system configured to convert the text input into a vector representation of the query to generate the initial state of the internal state. (Pg. 5 “We employ a Deep LSTM cell with skip connections from each input x(t) to every hidden layer, and from every hidden layer to the output y(t)… where || indicates vector concatenation h(t, k) is the hidden state for layer k at time t, and i, f, o are the input, forget, and output gates respectively. Thus our Deep LSTM Reader is defined by gLSTM(d, q) = y(|d|+|q|) with input x(t) the concatenation of d and q separated by the delimiter |||.”)

Regarding claim 10 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann wherein the initial state that is generated based on the query, (pg. 4 section 3.2 third paragraph “We feed our documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder. Alternatively we also experiment with processing the query then the document. The result is th( at this model processes each document query pair as a single long sequence.” also see pg. 2 section 2 first paragraph “The reading comprehension task naturally lends itself to a formulation as a supervised learning problem. Specifically we seek to estimate the conditional probability p(a|c, q), where c is a context document, q a query relating to that document, and a the answer to that query.”)
and wherein the initial state evolves into a final state that represents the answer to the query. (Pg. 4 section 3.2 third paragraph “The result is that this model processes each document query pair as a single long sequence. Given the embedded document and query the network predicts which token in the document answers the query.”)


Regarding claim 12
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 11.  
	Hermann further teaches the method further comprising receiving the text query as text input, (pg. 1 third paragraph “We observe that summary and paraphrase sentences, with their associated documents, can be readily converted to context–query–answer triples using simple entity detection and anonymisation algorithms.”)
and converting the text input into a vector representation of one or more words of the query, (pg. 5 first paragraph “The Deep LSTM Reader must propagate dependencies over long distances in order to connect queries to their answers. The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs [19].”)
the data representation of the query comprising the vector representation of the one or more words of the query. (Examiner notes vector representation of the query is shown in Figure 1: Document and query embedding models and Figure 1a)-c) show LSTM memory with information evolving different portions of text-based content “marry went to England”)

Regarding claim 13 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 11.  
Hermann further teaches wherein the data representation of the text content comprises vector representations of words of the content, (pg. 5 first paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs”)
and each vector representation of the query comprises a vector representation of one or more words of the query. (Examiner notes vector representation of the query is shown in Figure 1: Document and query embedding models and Figure 1a)-c) show LSTM memory with information evolving different portions of text-based content “marry went to England”)

Regarding claim 16
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 11.  
Hermann further teaches the method further comprising causing the current state of the data representation of the query to be output as an answer to the query. (Pg. 4 section 3.2 third paragraph “The result is that this model processes each document query pair as a single long sequence. Given the embedded document and query the network predicts which token in the document answers the query.”)

Regarding claim 17 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 11.  
Hermann further teaches wherein the current state of the data representation of the query comprises one or more words from the text content. (Figure 1: Document and query embedding models and Figure 1a)-c) show LSTM memory with information showing one or more words “marry went to England”)

Claims 5 and 7 are rejected under 35 U.S.C. 103 as being unpatentable over Hermann et al. (“Teaching Machines to Read and Comprehend”, hereinafter: Hermann) in view of Wang et al. (“CNN-RNN: A Unified Framework for Multi-label Image Classification”, hereinafter: Wang) in view of Romera-Paredes in view of Perez et al. and further in view of Dong et al. (“Language to Logical Form with Neural Attention”, hereinafter: Dong). 
Regarding claim 5
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann in view of Wang with Romera-Paredes and Perez does not teach wherein the termination gate maintains a variable that varies according to the internal state, and the termination criterion comprises a change in a state of the variable.  
Dong teaches wherein the termination gate maintains a variable that varies according to the internal state, and the termination criterion comprises a change in a state of the variable. (Pg. 3 “where Wo ∈ R|Va|×n is a parameter matrix, and e (yt) ∈ {0, 1} |Va| a one-hot vector for computing yt’s probability from the predicted distribution. We augment every sequence with a “start-ofsequence” <s> and “end-of-sequence” </s> token. The generation process terminates once </s> is predicted. The conditional probability of generating the whole sequence p (a|q) is then obtained using Equation (1)”)
Hermann, Wang, Romera-Paredes, Perez and Dong are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hermann in view of Wang with Romera-Paredes and Perez to incorporate the teaching of Dong to include termination condition that is able to terminate a process once the probability of sequence of words in a token has been predicted as disclosed by Dong (pg. 3 left col second paragraph).

Regarding claim 7 
Hermann in view of Wang with Romera-Paredes and Perez teaches claim 1. 
Hermann in view of Wang with Romera-Paredes and Perez does not teach wherein the neural network is executable by the at least one processor to iteratively apply the attention vector to the different portions of the text-based content until the termination criterion occurs based on a current state of the internal state. 
Dong teaches wherein the neural network is executable by the at least one processor to iteratively apply the attention vector (abstract “In this paper we present a general method based on an attention-enhanced encoder-decoder model. We encode input utterances into vector representations, and generate their logical forms by conditioning the output sequences or trees on the encoding vectors.”) to the different portions of the text-based content until the termination criterion occurs  (Pg. 3 “where Wo ∈ R|Va|×n is a parameter matrix, and e (yt) ∈ {0, 1} |Va| a one-hot vector for computing yt’s probability from the predicted distribution. We augment every sequence with a “start-ofsequence” <s> and “end-of-sequence” </s> token. The generation process terminates once </s> is predicted. The conditional probability of generating the whole sequence p (a|q) is then obtained using Equation (1)”)
Hermann, Wang, Romera-Paredes, Perez and Dong are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Hermann in view of Wang with Romera-Paredes and Perez to incorporate the teaching of Dong to include termination condition that is able to terminate a process once the probability of sequence of words in a token has been predicted as recognized by Dong (pg. 3 left col second paragraph).
Claims 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Mnih et al. (“Recurrent Models of Visual Attention”, hereinafter: Mnih) in view of Hermann et al. (“Teaching Machines to Read and Comprehend”, hereinafter: Hermann) in view of Zhang et al. (“Towards Machine Translation in Semantic Vector Space”, hereinafter: Zhang) and further in view of Perez-Ortiz et al. (“Kalman filters improve LSTM network performance in problems unsolvable by traditional recurrent nets”, hereinafter: Perez). 
Regarding claim 18 (Currently Amended)
Mnih teaches a computer-implemented method for training a neural network, (pg. 6 section 4.1 second paragraph “We also trained standard feedforward and convolutional neural networks with two hidden layers as a baselines. The error rates achieved by the different models on the test set are shown in Table 1a. We see that the performance of RAM generally improves with more glimpses, and that it eventually outperforms a the baseline models trained on the full 28 × 28 centered digits.”)
a training query, and a known answer to the training query; (pg. 4 “For the training images this label will be known and we can directly optimize the policy to output the correct label[corresponds to known answer] associated with a training image at the end of an observation sequence.”)
and ascertaining that a termination state occurs in response to a state of the training query corresponding to the known answer to the training query; (pg. 8 section 5 “Additionally, the flexibility of our approach allows for a number of interesting extensions. For example, the network can be augmented with another action that allows it terminate at any time point and make a final classification decision. Our preliminary experiments show that this allows the network to learn to stop taking glimpses once it has enough information to make a confident classification.”) 
evaluating the termination gate of the neural network at each iteration of the training query to determine whether the termination state occurs; (Examiner notes that the termination gate corresponds to output node of the RNN and every time it outputs it makes a determination of whether or not the termination state occur see pg. 8 section 5 “Additionally, the flexibility of our approach allows for a number of interesting extensions. For example, the network can be augmented with another action that allows it terminate at any time point and make a final classification decision.”)
…
the termination state being based at least on the internal state; (Examiner notes that output node of RNN is capable of outputting the result after a termination criteria is reached also this outputting process is also based on the internal state of RNN which leads to outputting a result after a termination. See Pg. 8 section 5 “Additionally, the flexibility of our approach allows for a number of interesting extensions. For example, the network can be augmented with another action that allows it terminate at any time point and make a final classification decision.”)
and applying a reward value (Examiner notes that reward value is applied then an action get rewards after execution and Mnih uses reinforcement learning(abstract) and see pg. 4 third paragraph “After executing an action the agent receives a new visual observation of the environment xt+1 and a reward signal rt+1”) to the termination state to reinforce the termination state in the neural network and to generate a trained instance of the neural network. (Pg. 8 section 5 “Additionally, the flexibility of our approach allows for a number of interesting extensions. For example, the network can be augmented with another action that allows it terminate at any time point and make a final classification decision.”)
Mnih does not teach the method comprising: receiving a training set of data including training content that includes text-based content, processing the training query and the text-based content using the neural network such that the training query is iterated over different portions of the text-based content causing an internal state to 
at a termination gate of the neural network, maintaining a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…
in response to ascertaining that a termination state has not occurred following an iteration, causing the internal state to evolve to a next state;
…
in response to ascertaining that a termination state has occurred, terminating analysis of the text-based content such that portions of the text-based content are not analyzed responsive to the state of the variable changing prior to analysis of all text- based questions. 
Hermann teaches the method comprising: receiving a training set of data including training content that includes text-based content, (pg. 2 section 2 third paragraph “Here we propose a methodology for creating real-world, large scale supervised training data for learning reading comprehension models. Inspired by work in summarisation [10, 11], we create two machine reading corpora by exploiting online newspaper articles and their matching summaries… We construct a corpus of document–query– answer triples by turning these bullet points into Cloze [12] style questions by replacing one entity at a time with a placeholder.”)
(pg. 2 section 2 third paragraph “Here we propose a methodology for creating real-world, large scale supervised training data for learning reading comprehension models. Inspired by work in summarisation [10, 11], we create two machine reading corpora by exploiting online newspaper articles and their matching summaries… We construct a corpus of document–query– answer triples by turning these bullet points into Cloze [12] style questions by replacing one entity at a time with a placeholder.” Examiner notes Hermann uses deep neural networks to train the dataset see Abstract “This allows us to develop a class of attention based deep neural networks that learn to read real documents and answer complex questions with minimal prior knowledge of language structure.”)
such that the training query is iterated over different portions of the text-based content causing an internal state to iteratively evolve from an initial state through a plurality of states as the training query is processed. (Examiner notes that at each state S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4), the internal state LSTM memory reads each query thus mean revolves at different portions of the text as evidence by FIG. 1 where under initial state S(1)y(1) the text is “Mary” and under S(2)y(2) the text is “went”… etc) the plurality of subsequent states corresponds to S(1)y(1), S(2)y(2), S(3)y(3) and s(4)y(4). Also see pg. 5 first paragraph “The fixed width hidden vector forms a bottleneck for this information flow that we propose to circumvent using an attention mechanism inspired by recent results in translation and image recognition [6, 7]. This attention model first encodes the document and the query using separate bidirectional single layer LSTMs” Pg. 4 section 3.2 third paragraph “When used for translation, Deep LSTMs [19] have shown a remarkable ability to embed long sequences into a vector representation which contains enough information to generate a full translation in another language. Our first neural model for reading comprehension tests the ability of Deep LSTM encoders to handle significantly longer sequences. We feed our documents one word at a time into a Deep LSTM encoder, after a delimiter we then also feed the query into the encoder”)
Mnih and Hermann are analogous art because they are directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mnih to incorporate the teaching of Hermann to include a methodology for obtaining a large number of document-query-answer triples and show that recurrent and attention based neural networks provide an effective modelling framework as disclosed by Hermann (pg. 8 second paragraph).
Mnih in view of Hermann does not teach at a termination gate of the neural network, maintaining a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…
in response to ascertaining that a termination state has not occurred following an iteration, causing the internal state to evolve to a next state;
…
in response to ascertaining that a termination state has occurred, terminating analysis of the text-based content such that portions of the text-based content are not analyzed responsive to the state of the variable changing prior to analysis of all text- based questions. 
Zhang teaches in response to ascertaining that a termination state has not occurred following an iteration, causing the internal state to evolve to a next state. (Pg. 9:11 “Termination Check. If the joint error reaches a local minima or the iterations reach the predefined number (25 is used in experiments), we terminate the training procedure; otherwise, we set ps = p’s, pt = p’t, and go to step 2.”)Page 6 of 12Application No. 15/639,705
Mnih, Hermann and Zhang are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mnih in view of Hermann to incorporate the teaching of Zhang to include RNN-based translation framework which “can significantly improve the translation quality in the large-scale” as disclosed by Zhang (abstract “The RNN-based translation model is trained using a max-margin objective function that maximizes the margin between the reference translation and the n-best translations in forced decoding. In the experiments, we first show that the proposed vector representations for the translation rules are very reliable for application in translation modeling. We further show that the proposed type-dependent, RNN-based model can significantly improve the translation quality in the large-scale, end-to-end Chineseto-English translation evaluation.”).
Mnih in view of Hermann with Zhang does not teach at a termination gate of the neural network, maintaining a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; 
…
in response to ascertaining that a termination state has occurred, terminating analysis of the text-based content such that portions of the text-based content are not analyzed responsive to the state of the variable changing prior to analysis of all text- based questions. 
Perez teaches at a termination gate of the neural network, maintaining a variable that varies according to the internal state, wherein a termination criterion comprises a change in a state of the variable; (Examiner notes that output gate has change in state because it does the outputting Fig. 1. A LSTM memory block with one cell. If the memory block had two or more cells, the three gates (below; from left to right: input gate, forget gate, and output gate) would be shared by all of them.” Also see pg. 242 right col second paragraph “When representing the weights, superscripts indicate the computation in which the weight is involved: the w; c in… indicates that the weight is used to compute the activation of a forget gate (w) from a cell (c ); the ‘out’ in Woutj (a bias) indicates that it is used to compute an output gate.”)
…
in response to ascertaining that a termination state has occurred, terminating analysis of the text-based content such that portions of the text-based content are not analyzed responsive to the state of the variable changing prior to analysis of all text- based questions. (Examiner notes that Perez teaches updating gate variable at the end of each iteration which corresponds to termination gate and every time a gate variable is updated the termination has occurred the state does not change because there is a termination criterion occurred before computing the output see pg. 244 right col “We consider a group of weights for each neuron in LSTM, that is, a group for each different gate, cell and output neuron, giving g=nM(nC+3)+ nY. At time step t we calculate the derivatives required for matrix Ci(t) as indicated in Section 2.2, and then apply Eqs. (16)–(18) in order to update weights wi(t).”)
Mnih, Hermann, Zhang and Perez are analogous art because they are all directed to neural network. 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Mnih in view of Hermann with Zhang to incorporate the teaching of Perez to include Kalman filters to improve LSTM network performance. 
One of ordinary skill in the art would have been motivated to make this modification in order to improve LSTM using Kalman filter training algorithm which “allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm” as disclosed by Perez (abstract “The long short-term memory (LSTM) network trained by gradient descent solves difficult problems which traditional recurrent neural networks in general cannot. We have recently observed that the decoupled extended Kalman filter training algorithm allows for even better performance, reducing significantly the number of training steps when compared to the original gradient descent training algorithm. In this paper we present a set of experiments which are unsolvable by classical recurrent networks but which are solved elegantly and robustly and quickly by LSTM combined with Kalman filters.”).

Regarding claim 19 
Mnih in view of Hermann with Zhang with Perez teaches claim 18. 
Mnih further teaches wherein said processing causes the state of the training query to evolve from the initial state to a final state that represents the known answer to the query. (Pg. 5 third paragraph “For the training images this label will be known and we can directly optimize the policy to output the correct label[corresponds to known answer] associated with a training image at the end of an observation sequence. This can be achieved, as is common in supervised learning, by maximizing the conditional probability of the true label given the observations from the image”)


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Sordoni et al. (“Iterative Alternating Neural Attention for Machine Reading”) teaches novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document. 
Chen et al. (“ABC-CNN: An Attention Based Convolutional Neural Network for Visual Question Answering”) teaches an attention-based configurable 
Dhingra et al. (“Gated-Attention Readers for Text Comprehension”) teaches new attention mechanism which uses multiplicative interactions between the query embedding and intermediate states of a recurrent neural network reader.

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to VAN C MANG whose telephone number is (571)270-7598.  The examiner can normally be reached on Mon - Fri 8:00-5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on 5712729767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/V.M./Examiner, Art Unit 2126 
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126