DETAILED ACTION
This action is in response to a request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on 10 June 2022 has been entered. Furthermore, this action is in response to the amendments filed  10 June 2022 for application 15/339303 filed on 31 October 2016. Currently claims 1-11, 13-23, and 25-30 are pending. Claims 12 and 24 were previously cancelled. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant’s arguments with respect to claim(s) 1-11, 13-23, and 25-30 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 8, 9, 11, 25-27, 13-15, 20, 21, 23, and 25-30 are rejected under 35 U.S.C. 103 as being unpatentable over by Rae et al. (“Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes,” 27 October 2016, arXiv:1610.09027v1 [cs.LG], 17 pages), hereinafter referred to as Rae, in view of Spithourakis et al. (“Numerically Grounded Language Models for Semantic Error Correction”, https://arxiv.org/pdf/1608.04147.pdf, arXiv:1608.04147v1 [cs.CL] 14 Aug 2016, pp. 1-6), hereinafter referred to as Spithourakis.

In regards to claim 1, Rae teaches a computer-executed method comprising: training, based on a set of sequential training data, a recurrent neural network that is equipped with a differentiable set data structure (See at [p. 11, Section A.3, Figure 5] A schematic of the memory efficient backpropagation through time., Further at [p. 12, Section B Figure 6], Schematic showing how the controller  interfaces with the external memory [in our experiments.; wherein a recurrent neural network (Figure 5) is trained using BP through time (i.e., the training is based on sequential training data) in which the data structure  is differentiable (also shown in Figures 5 and 6 as M).)wherein training the recurrent neural network comprises: performing one or both of:  adding an element to the differentiable set data structure based, at least in part, on a hidden state of the recurrent neural network (See at [p. 4, Section 3.3] The LSTM then produces a vector,                         
                            
                                
                                    p
                                
                                
                                    t
                                
                            
                            =
                            (
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                            ,
                             
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                            )
                        
                    , of read and write parameters for memory access via a linear layer., Further at [p. 3, Section 2.3, Equation 5, Figure 6], A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs., wherein new data elements are added to the differentiable memory structure based on the response of the LSTM (i.e., a hidden state of the RNN which produces the read and write parameters  that control read/write access as can be seen in Figure 6 in which the hidden state h_t is used, via a linear projection p_t to read and write to the memory) with Equation (5) also disclosing the calculation of write weights using the write parameters (which are used to form the erase and write matrices used in equation 3 to update the differentiable memory structure.), and performing a query over the differentiable set data structure based, at least in part, on the hidden state of the recurrent neural network (See at [p. 3, Section 3.1], Since the K largest values in                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     correspond to the K closest points to our query                         
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                        
                    , we can use an approximate nearest neighbor data-structure, described in Section 3.5, to calculate                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                    ., Further at [p. 4, Section 3.3], The LSTM  then produces a vector,                         
                            
                                
                                    p
                                
                                
                                    t
                                
                            
                            =
                            (
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                            )
                        
                    , of read and write parameters for memory access via a linear layer., Further at [p. 4, Section 3.5, Figure 6], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words. Where a linear KNN search inspects every element in memory (taking O(N) time), an ANN index maintains a structure over the dataset to allow for fast inspection of nearby points in O(log N) time., wherein the differentiable memory structure is queried (e.g., for reading) based on the vector p_t of parameters which control read/write access to that memory with the performance of a query over that differentiable set data structure (M) depending on the hidden state h_t of the LSTM as can be seen in Figure 6 (i.e., via a linear projection p_t to read and write to the memory)  but also according to the read vector r_i (which also depends on h_t as shown in Figure 6)  in the performance of the search/query for the K nearest words in that structure.); and after performing one or both of adding the element and performing the query, generating a prediction, based on output of the query, without using the hidden state of the recurrent neural network (See at [p. 3, Section 3.1, Equation 4], We wish to construct                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     such that                         
                            
                                
                                    
                                        
                                            r
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                            
                            ≈
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                        
                     [see Equation (4)], … Since the K largest values in                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     correspond to the K closest points to our query                         
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                        
                    , we can use an approximate nearest neighbor data-structure, described in Section 3.5, to calculate                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     in O(log N) time., wherein in response to adding an element to the differentiable data structure and performing the query (to determine the K words most consistent with the query), a predicted section of pertinent words in that structure is computed using equation 4 (which does not explicitly/directly make use of the LSTM hidden state values).); wherein training the recurrent neural network produces a trained recurrent neural network (See at [p. 7, Section 4.5], After training all MANNs for the same length of time, a validation task with 500 characters was used to select the best run, and this was then tested on a test set, containing all novel characters for different sequence lengths (Figure 4)., wherein the implementation and validation of a model using real world data discloses a trained recurrent neural network.); detecting, based, at least in part, on the trained recurrent neural network and the differentiable set data structure that a sequence of unlabeled data is … valid (See at [p. 7, Section 4.5, Figure 4], In order to succeed at the task the model must learn to rapidly associate a novel character with the correct label, such that it can correctly classify subsequent examples of the same character class. …After training all MANNs for the same length of time, a validation task with 500 characters was used to select the best run, and this was then tested on a test set, containing all novel characters for different sequence lengths., wherein the validation task discloses identifying/detecting one or more properties of the sequence (e.g., classification interpreted in general to include the determination of a valid correspondence between the character sequence and character classes) of unlabeled data based on the trained RNN and the differentiable set data structure applied to a sequence of unlabeled data (Figure 4)); wherein the method is performed by one or more computing devices (See at [p. 2, Section 1], In this paper, we present a MANN named SAM. By thresholding memory modifications to a sparse subset, and using efficient data structures for content-based read operations, our model is optimal in space and time with respect to memory size, while retaining end-to-end gradient based optimization., wherein the use of a memory structure suggests the necessity for one or more computing devices.).
However, Rae does not explicitly disclose …semantically invalid, wherein the sequence of unlabeled data is syntactically…. Although Rae teaches the application of the LSTM to classify characters, he does not explicitly disclose that the object of this classification is to detect whether the (character) sequence is semantically invalid (while being syntactically valid).
However, Spithourakis, in the analogous environment of using an RNN to classify character sequences, teaches “detecting based, at least in part, on the trained recurrent neural network and the differentiable set data structure that a sequence of unlabeled data is semantically invalid, wherein the sequence of unlabeled data is syntactically valid ” (See at [p. 2, Section 1, Figure 1] If then the system is presented with the phrase “non dilated” in the context of a low value, it will detect a semantic inconsistency and correct the text to “severely dilated”., Further at [p. 2, Section 2.1] A neural LM uses a matrix, Ein ∈ R D×V , to derive word embeddings, e w t = Einwt . A hidden state from the previous time step, ht−1, and the current word embedding, e w t , are sequentially fed to an RNN’s recurrence function to produce the current hidden state, ht ∈ R D. The conditional probability of the next word is estimated as softmax(Eoutht), where Eout ∈ R V ×D is an output embeddings matrix., Further at [p. 4, Section 4.1, Figure 3] We calculate the probabilities for observing the document with different word choices {“non”, “mildly”, “severely”} under the grounded LM and find that “non dilated” is associated with higher EF values. This shows that it has captured semantic dependencies on numbers., wherein an RNN (LSTM) is used is trained to detect semantic errors in a sequence of words based on the sequence of hidden states of that RNN (corresponding to the sequence of words and their associated embeddings) such that, as shown in Figures 1 and 4, the RNN detects whether any particular word in that sequence is semantically invalid (by virtue of predicting/classifying the likelihood of that word based on previous training and using a stored confusion set of words that may be hypothesized to occur in place of that word) with Figure 4 in particular showing an example of an input sentences which are syntactically valid since words in in the sequence of words in any sentence (assumed to end with the word “end”) determine the semantics of that sentence (specifically, a quantity/number but also units of measurement) with the syntax otherwise presumed to be valid; in other words, the detection of the semantic invalidity is based on the internal consistency of the words in the sentence/phrase.)  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of  Spithourakis to detect based, at least in part, on the trained recurrent neural network and the differentiable set data structure that a sequence of unlabeled data is semantically invalid, wherein the sequence of unlabeled data is syntactically valid. The modification would have been obvious because one of ordinary skill would have been motivated to improve the performance (perplexity) of semantic error detection and correction for sentences/phrases using an RNN-based neural model to represent the learned semantics of those sentences/phrases, particularly when that neural model is conditioned on a lexicalized knowledge base and grounded (Spithourakis, [Abstract, p. 4, Section 4.2, Tables 3-5]).

In regards to claim 2, the rejection of claim 1 is incorporated and Rae further teaches wherein: adding the element to the differentiable set data structure is performed via a continuous operation (See at[p. 4, Section 3.3], The LSTM then produces a vector,                         
                            
                                
                                    p
                                
                                
                                    t
                                
                            
                            =
                            (
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                            ,
                             
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                            )
                        
                    , of read and write parameters for memory access via a linear layer., Further at [p. 3, Section 2.3], A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs., wherein the operations applied to the differentiable memory structure  (including adding an element) is continuous based on the decayed erase operation and updated information (equation 3 – a continuous operation particularly with respect to A_t) with the continuous operation also corresponding to the differentiability of the memory structure itself (e.g., the weights applied to a_t are continuous where it is noted that  the Specification supports this interpretation for a continuous operation at paragraph [0016]: “operations on a differentiable set data structure are continuous.”); and performing the query over the differentiable set data structure is performed via a continuous operation (See at [pp. 4-5, Section 3.5], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words. … an ANN index maintains a structure over the dataset to allow for fast inspection of nearby points in O(log N) time., wherein the operations applied to the differentiable memory structure  (including reading/querying an element) is continuous based on the decayed erase operation and updated information (equation 3 – a continuous operation particularly with respect to R_t) with the continuous operation also corresponding to the differentiability of the memory structure itself (e.g., the weights applied to read operation r_t are continuous where equation 4 provides an approximation for r_t and  where it is noted that  the Specification supports this interpretation for a continuous operation at paragraph [0016]: “operations on a differentiable set data structure are continuous.”).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 3, the rejection of claim 1 is incorporated and Rae further teaches wherein: the differentiable set data structure represents a logical set of values (See at [p. 3, Section 3.2], The write operation [in] SAM is an instance of (3) where the write weights                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                        
                     are constrained to contain a constant number of non-zero entries., Further at [p. 3, Section 2.3], A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs., wherein the differentiable memory is constrained to contain a fixed number of non-zero entities, thereby forming a logical set of values within that structure); and the differentiable set data structure stores a plurality of probabilities that indicate whether corresponding values, that correspond to the plurality of probabilities, are included in the logical set of values (See at [p. 2, Section 2.3] The Neural Turing Machine is a recurrent neural network equipped with a content-addressable memory, similar to Memory Networks, but with the additional capability to write to memory over time., Further at [p. 3, Section 3.2], The write operation [in] SAM is an instance of (3) where the write weights                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                        
                     are constrained to contain a constant number of non-zero entries., Further at [p. 3, Section 2.3], A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs., wherein a distinct weight w_t (associated with add/write) is assigned to each element (in a sequence) such that this weight is constrained to be between 0 and 1 and, by virtue of the content-addressable memory functionality of the SAM/NTM framework, also corresponds to a probability attached to the appearance/addition of the element in the differentiable memory structure.).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 8, the rejection of claim 1 is incorporated and Rae further teaches wherein: the set of sequential training data comprises one or more sequences of words (See at [p. 7, Section 4.5], in one example, [20] introduced toy tasks they considered a prerequisite to agents which can reason and understand natural language. They are synthetically generated language tasks with a vocab of about 150 words  that test various aspects of simple reasoning such as deduction, induction and coreferencing. We tested the models (including the Sparse Differentiable Neural Computer described in Supplementary D) on this task., wherein one or more sequences of words are used to train the RNN to perform question answering.) ; the method further comprises at least one selected from the group consisting of: identifying the one or more properties of the sequence of unlabeled data based, at least in part, on the trained recurrent neural network and the differentiable set data structure (See at [p. 7, Section 4.5, Figure 4], In order to succeed at the task the model must learn to rapidly associate a novel character with the correct label, such that it can correctly classify subsequent examples of the same character class. …After training all MANNs for the same length of time, a validation task with 500 characters was used to select the best run, and this was then tested on a test set, containing all novel characters for different sequence lengths., wherein the validation task discloses identifying/detecting one or more properties of the sequence (e.g., classification interpreted in general to include determination of a valid correspondence between the character sequence and character classes) of unlabeled data based on the trained RNN and the differentiable set data structure applied to a sequence of unlabeled data (Figure 4).) and performing both of: determining whether a particular word is identified in the differentiable set data structure (See at [p. 4, Section 3.5], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words., wherein the query operation applied to the differentiable memory structure of the LSTM determines whether a pertinent word is in that memory structure (i.e., among the top k words)); and classifying a portion of the sequence of unlabeled data based, at least in part, on determining that the particular word is identified in the differentiable set data structure (See at [pp. 4-5, Section 3.5], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words…. Both the memory and the ANN index are passed through the network and kept in sync during writes., Further at [p. 12, Figure 6], The result of the read operation                         
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                        
                     is combined with                         
                            
                                
                                    h
                                
                                
                                    t
                                
                            
                        
                     to produce output                         
                            
                                
                                    y
                                
                                
                                    t
                                
                            
                        
                     , as well as being feed into the controller at the next timestep (                        
                            
                                
                                    r
                                
                                
                                    t
                                    -
                                    1
                                
                            
                            )
                        
                    ., wherein the RNN produces an output which classifies a portion of the sequence of unlabeled data using a hidden state of the RNN along with the read operation (which identifies the particular word as previously pointed out).). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 9, the rejection of claim 1 is incorporated and Rae further teaches wherein backpropagation is used to train the recurrent neural network that is equipped with the differentiable set data structure (See at [p. 11, Figure 5], A schematic of the memory efficient backpropagation through time. Each circle represents an instance of the SAM core at a given time step., wherein BP is used to train the LSTM with the differentiable memory).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 11, the rejection of claim 1 is incorporated and Rae further teaches wherein the recurrent neural network is a Long Short-Term Memory Recurrent Neural Network (See at [p. 4, Section 3.3], We use a one layer LSTM for the controller throughout., wherein the RNN used in combination with the differentiable memory structure is an LSTM).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 25, the rejection of claim 1 is incorporated and Rae further teaches wherein training the recurrent neural network comprises performing a mixture of said adding the element to the differentiable set data structure and said performing the query over the differentiable set data structure (See at [p. 11, Section A.3], We avoid caching the modified memory, and thus duplicating it, by applying the write directly to the memory., Further at ([p. 4, Section 3.5, Equations 7-9], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words.), wherein training the RNN makes use of a combination/mixture of functions for adding and querying/reading the differentiable memory structure for learning the read and query parameters together (see in particular equations 7-9).). 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 26, the rejection of claim 1 is incorporated and Rae further teaches wherein: adding the element to the differentiable set data structure is further based, at least in part, on a value generator hash (See at [pp. 4-5, Section 3.5], In our case, the memory is still a dense tensor that the network directly operates on; however the ANN is a structured view of its contents…. We considered two types of ANN indexes:.… Both the memory and the ANN index are passed through the network and kept in sync during writes.… We used randomized k-d trees for small word sizes and LSHs for large word sizes.), wherein locally sensitive hashes are used for indexing (accessing and writing/adding) words in the differentiable memory structure (for large word sizes).);  the performing the query over the differentiable set data structure is based on multiple positions within the differentiable set data structure (See at [pp. 4-5, Section 3.5], When querying the memory, we can use an approximate nearest neighbor index (ANN) to search over the external memory for the K nearest words., wherein the differentiable memory framework uses an index to search the memory which suggests/indicates multiple positions within the differentiable set data structure.); and said output of the query represents a probability that a query element is represented by the differentiable set data structure (See at [p. 2, Section 2.1],                         
                            w
                            ∈
                            
                                
                                    R
                                
                                
                                    N
                                
                            
                        
                     is a vector of weights with non-negative entries that sum to one., Further at [p. 2, Section 2.3] The Neural Turing Machine is a recurrent neural network equipped with a content-addressable memory, similar to Memory Networks, but with the additional capability to write to memory over time., Further at [p. 3, Section 3.1], We will refer to sparse analogues of weight vectors                         
                            w
                        
                     as                         
                            
                                
                                    w
                                
                                ~
                            
                        
                    , and when discussing operations that are used in both the sparse and dense versions of our model use                         
                            w
                        
                    . … Since the K largest values in                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     correspond to the K closest points to our query                         
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                        
                      we can use an approximate nearest neighbor data-structure, described in Section 3.5, to calculate                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     in O(log N) time., wherein a distinct weight w_t^R (associated with the read/query) is assigned to each word (in a sequence) such that this weight is constrained to be between 0 and 1 and, by virtue of the content-addressable memory functionality of the SAM/NTM framework, also corresponds to a probability such that the weight w_t^R corresponds to the presence attached to the appearance/addition/retrieval of the element in the differentiable memory structure.).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

In regards to claim 27, the rejection of claim 1 is incorporated and Rae further teaches wherein the differentiable set data structure is represented as an array (See at [p. 10, Section A], Let M be a collection of real vectors m1, m2, . . . , mN of fixed dimension d. Let A be the set of all content addressable memory data structures that store M and can return at least one word mj., wherein the differentiable memory structure is represented as an array/tensor).
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis for the same reasons as pointed out for claim 1.

Claim 13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 1 which can be found in Rae and Spithourakis. It is noted that claim 13 also recites a processor with memory having instructions which are also found in Rae (e.g., [p. 14, Section E] We used Torch7 [5] to implement SAM, DAM, NTM, DNC and SDNC. Eigen v3 [9] was used for the fast sparse tensor operations, using the provided CSC and CSR formats. All benchmarks were run on a Linux desktop running Ubuntu 14.04.1 with 32GiB of RAM and an Intel Xeon E5-1650 3.20GHz processor with power scaling disabled.).
Claim 14/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 2/1 which can be found in Rae and Spithourakis.

Claim 15/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 3/1 which can be found in Rae and Spithourakis.

Claim 20/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 8/1 which can be found in Rae and Spithourakis.

Claim 21/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 9/1 which can be found in Rae and Spithourakis.

Claim 23/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 11/1 which can be found in Rae and Spithourakis.

Claim 28/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 25/1 which can be found in Rae and Spithourakis.

Claim 29/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 26/1 which can be found in Rae and Spithourakis.

Claim 30/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 27/1 which can be found in Rae and Spithourakis.

Claims 4, 5, 7, 16, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over by Rae, in view of Spithourakis, and in further view of Gulcehre et al. (“Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes,” https://arxiv.org/abs/1607.00036v1, arXiv:1607.00036v1 [cs.LG] 30 Jun 2016, pp. 1-13), hereinafter referred to as Gulcehre.

In regards to claim 4, the rejection of claim 1 is incorporated and Rae further teaches wherein: the differentiable set data structure represents a logical set of values (See at [p. 3, Section 3.2], The write operation [in] SAM is an instance of (3) where the write weights                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                        
                     are constrained to contain a constant number of non-zero entries., Further at [p. 3, Section 2.3], A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs., wherein the differentiable memory is constrained to contain a fixed number of non-zero entities, thereby forming a logical set of values within that structure); training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating a control command based on an … function and the hidden state of the recurrent neural network; wherein the control command indicates a probability that a particular value will be added to the logical set of values (See at [p. 4, Section 3.3, Figure 6], The LSTM then produces a vector,                         
                            
                                
                                    p
                                
                                
                                    t
                                
                            
                            =
                            (
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                            ,
                             
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                            )
                        
                    , of read and write parameters for memory access via a linear layer., Further at [Equation 5 on p. 4, Section 3.2, p. 3, Section 2.3], “A write to memory, [Equation] (3) consists of a copy of the memory from the previous time step Mt−1 decayed by the erase matrix Rt indicating obsolete or inaccurate content, and an addition of new or updated information At. The erase matrix                         
                            
                                
                                    R
                                
                                
                                    t
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is constructed as the outer product between a set of write weights                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    N
                                
                            
                        
                     and erase vector                         
                            
                                
                                    e
                                
                                
                                    t
                                
                                
                                    W
                                
                            
                            ∈
                            
                                
                                    
                                        
                                            0,1
                                        
                                    
                                
                                
                                    M
                                
                            
                        
                    . The add matrix                         
                            
                                
                                    A
                                
                                
                                    T
                                
                            
                            =
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    w
                                
                            
                            
                                
                                    a
                                
                                
                                    t
                                
                                
                                    T
                                
                            
                        
                     is the outer product between the write weights and a new write word                         
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ∈
                            
                                
                                    R
                                
                                
                                    M
                                
                            
                        
                    , which the controller outputs.”, wherein a distinct weight w_t (associated with add/write) is assigned to each element (in a sequence) such that this weight is constrained to be between 0 and 1 and, by virtue of the content-addressable memory functionality of the SAM/NTM framework, also corresponds to a probability attached to the appearance/addition of the element in the differentiable memory structure, and wherein the training process specifically learns these weights based on the hidden states of the RNN (i.e., as shown in Figure 6, the hidden states h_t of the LSTM are used to read/write to memory via p_t and, as shown in Equation (5), the write weights which are calculated using values the parameters                         
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                        
                     and                         
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                        
                     thereby generated by the LSTM).)
However, Rae and Spithourakis do not explicitly disclose … sigmoid activation …. Although Rae teaches an LSTM implementation and although an LSTM inherently includes (gate) activations, he does not explicitly disclose the functional form of those activations.  Although Spithourakis trains and applies an LSTM for character recognition/classification, he does not disclose the specifics of the LSTM internal operation.
However, Gulcehre, in the analogous environment of using an RNN to classify character sequences using a differentiable memory structure, teaches training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating a control command based on an sigmoid activation function and the hidden state of the recurrent neural network; wherein the control command indicates a probability that a particular value will be added to the logical set of values  (See at [p. 4, Section 3.3], We rescale the accumulated vt with γt, such that the controller adjusts the influence of how much the previously written memory locations should effect the attention weights of a particular time-step. Next, we subtract vt from zt in order to reduce the weights of previously read or written memory locations. γt is a shallow MLP with a scalar output and it is conditioned on the hidden state of the controller. γt is parametrized with the parameters uγ and bγ, <equations 8, 9> This scheme has an effect of increasing the weights of the least recently used read and write weights. The magnitude of this reduction is being learned and adjusted with γt., wherein the (address) write vector  (write head/addition of content to a memory but also, more generally the read, write, and erase vectors) is determined (during training) by computing the parameter gamma_t by applying the sigmoid activation function to an expression that includes the hidden states of the RNN (equation 8) with the (address) write vector itself formed from the softmax function (equation 9 which is being interpreted as generating a probability associated with the write function into the memory but where, in general the write vector is probabilistic function as seen in equation 7) and wherein, like Rae, Gulcehre teaches a differentiable memory structure that is fully trainable.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae and Spithourakis to incorporate the teachings of Gulcehre to train the recurrent neural network that is equipped with the differentiable set data structure comprises: generating a control command based on an sigmoid activation function and the hidden state of the recurrent neural network; wherein the control command indicates a probability that a particular value will be added to the logical set of values. The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of the application of RNN_based neural Turing machine applications by using a differentiable memory in which the RNN-based framework is trained using to write content to the differentiable memory dynamically using write address vectors using soft differentiable mechanisms, that includes sigmoidal and softmax functions to accentuate more recent memory locations (with predictable results relative to non NTM-based models) (Gulcehre, [Abstract, p. 4, Section 3.3, p. 8, Section 7.3, p. 9, Section 8, Table 2]).

In regards to claim 5, the rejection of claim 4 is incorporated and Rae further teaches wherein training the recurrent neural network that is equipped with the differentiable set data structure further comprises: generating a new probability that the particular value is included in the logical set of values by adding the control command to a previous probability that the particular value is included in the logical set of values (See at [pp. 3-4, Section 3.2] This is done by a simple scheme where the controller writes either to previously read locations, in order to update contextually relevant memories, or the least recently accessed location, in order to overwrite stale or unused memory slots with fresh content. … We decided upon the previously read / least recently accessed addressing scheme for simplicity and flexibility. The write weights are defined as <equation 5> where the controller outputs the interpolation gate parameter γt and the write gate parameter αt. The write to the previously read locations w R t−1 is purely additive, while the least recently accessed word I U t is set to zero before being written to…, wherein a new write probability (corresponding to w_t^W which is indicative of the presence of a particular (logical) element in memory from among a fine set of such elements) is computed/updated (using a previous probability w^R_(t-1) associated with the access/read word/element at that memory location) according to equation 5 which favors updates according to the usage rate of different memory locations (with the controller specifically outputting the parameters (gate, interpolation) which affect this probability).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae to incorporate the teachings of Spithourakis and Gulcehre for the same reasons as pointed out for claims 1 and 4 respectively.

In regards to claim 7, the rejection of claim 1 is incorporated and Rae further teaches wherein training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating, based on a… function and the hidden state of the recurrent neural network, a location vector that indicates a location of a particular value within the differentiable set data structure (See at [p. 4, Section 3.3], The LSTM then produces a vector,                         
                            
                                
                                    p
                                
                                
                                    t
                                
                            
                            =
                            (
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                            ,
                             
                            
                                
                                    a
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    α
                                
                                
                                    t
                                
                            
                            ,
                            
                                
                                    γ
                                
                                
                                    t
                                
                            
                            )
                        
                    , of read and write parameters for memory access via a linear layer., Further at [p. 3, Section 3.1], We wish to construct                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                      such that                         
                            
                                
                                    r
                                
                                
                                    t
                                
                            
                            ≈
                            
                                
                                    
                                        
                                            r
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                            
                        
                    …. Since the                         
                            K
                        
                     largest values in                         
                            
                                
                                    w
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     correspond to the                         
                            K
                        
                     closest points to our query                         
                            
                                
                                    q
                                
                                
                                    t
                                
                            
                        
                    , we can use an approximate nearest neighbor data-structure, described in Section 3.5, to calculate                         
                            
                                
                                    
                                        
                                            w
                                        
                                        ~
                                    
                                
                                
                                    t
                                
                                
                                    R
                                
                            
                        
                     in                         
                            O
                            (
                            l
                            o
                            g
                             
                            N
                            )
                        
                     time., wherein a distinct weight w^R (associated with read) is assigned to each element (in a sequence) such that this weight is constrained to be between 0 and 1 and, by virtue of the content-addressable memory functionality of the SAM/NTM framework such that w^R is a location vector that indicates a location of a particular value within the differentiable set (according to the indexing of that vector)).  
However, Rae and Spithourakis do not explicitly disclose … sigmoid activation …. Although Rae teaches an LSTM implementation and although an LSTM inherently includes (gate) activations, he does not explicitly disclose the functional form of those activations.  Although Spithourakis trains and applies an LSTM for character recognition/classification, he does not disclose the specifics of the LSTM internal operation.
However, Gulcehre, in the analogous environment of using an RNN to classify character sequences using a differentiable memory structure, teaches training the recurrent neural network that is equipped with the differentiable set data structure comprises: generating, based on a sigmoid activation function and the hidden state of the recurrent neural network, a location vector that indicates a location of a particular value within the differentiable set data structure  (See at [p. 4, Section 3.3], We rescale the accumulated vt with γt, such that the controller adjusts the influence of how much the previously written memory locations should effect the attention weights of a particular time-step. Next, we subtract vt from zt in order to reduce the weights of previously read or written memory locations. γt is a shallow MLP with a scalar output and it is conditioned on the hidden state of the controller. γt is parametrized with the parameters uγ and bγ, <equations 8, 9> This scheme has an effect of increasing the weights of the least recently used read and write weights. The magnitude of this reduction is being learned and adjusted with γt., wherein the (address) read vector  (read head of content to a memory but also, more generally the read, write, and erase vectors) is determined (during training) by computing the parameter gamma_t by applying the sigmoid activation function to an expression that includes the hidden states of the RNN (equation 8) with the (address) read vector itself formed from the softmax function (equation 9 which is being interpreted as generating a probability associated with the read function into the memory but where, in general the read vector is probabilistic function as seen in equation 7) and wherein, like Rae, Gulcehre teaches a differentiable memory structure that is fully trainable.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae and Spithourakis to incorporate the teachings of Gulcehre to train the recurrent neural network that is equipped with the differentiable set data structure comprises: generating, based on a sigmoid activation function and the hidden state of the recurrent neural network, a location vector that indicates a location of a particular value within the differentiable set data structure. The modification would have been obvious because one of ordinary skill would have been motivated to improve the accuracy of the application of RNN_based neural Turing machine applications by using a differentiable memory in which the RNN-based framework is trained using to read content from the differentiable memory dynamically using read address vectors using soft differentiable mechanisms, that includes sigmoidal and softmax functions to accentuate more recent memory locations (with predictable results relative to non NTM-based models) (Gulcehre, [Abstract, p. 4, Section 3.3, p. 8, Section 7.3, p. 9, Section 8, Table 2]).

Claim 16/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 4/1 which can be found in Rae, Spithourakis, and Gulcehre.

Claim 17/16 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 5/4 which can be found in Rae, Spithourakis, and Gulcehre.

Claim 19/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 7/1 which can be found in Rae, Spithourakis, and Gulcehre.

Claims 10 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over by Rae, in view of Spithourakis, and in further view of Chauhan et al. (“Finding Similar Items using LSH and Bloom Filter,” 2014, IEEE International Conference on Advanced Communication Control and Computing Technologies, pp. 1662-1666)

In regards to claim 10, the rejection of claim 1 is incorporated and Rae and Spithourakis do not further teach wherein the differentiable set data structure is implemented with a Bloom filter. Neither Rae and Spithourakis disclose the use of a Bloom filter (although Rae disclose the use of LSH functions to access/search the memory for nearest neighbor search).
However, Chauhan in the analogous art of using LSH to search a memory, discloses wherein the differentiable set data structure is implemented with a Bloom filter (See at [p. 1665, Section VII], This work is based on Approximate Nearest Neighbor Search using Locality Sensitive Hashing technique. In characteristic matrix searching of a shingle linearly depends on file size, i.e. O(n), which is significantly large time when processing large files. In this work, an algorithm has been proposed which uses Bloom Filters for storing shingles, resulting in constant search time and comparison with standard algorithm has been shown., Further at [p. 1664, Section IV] Bloom Filter, a probabilistic data structure that uses multiple hash functions to store data in a large bit array was introduced in 1970 by Burton H. Bloom. It is mainly used for membership queries when dealing with large data sets., wherein a Bloom filter is used to represent each element (shingle, sub-string) in a probabilistic data structure to facilitate a nearest neighbor search using LSH.)  
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Rae and Spithourakis to incorporate the teachings of Chauhan for the differentiable set data structure to be implemented with a Bloom filter. The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved nearest neighbor search times over a memory using LSH by representing/implementing the content of that memory probabilistically (for Hash functions) using a Bloom filter (Chauhan, [Abstract, p. 1664, Section IV, Table 2]).

Claim 22/13 is also rejected because it is just a computer readable media implementation of the same subject matter of claim 10/1 which can be found in Rae, Spithourakis, and Gulcehre.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Patraucean et al. (“Spatio-Temporal Video Autoencoder with Differentiable Memory”, https://arxiv.org/pdf/1511.06309.pdf, arXiv:1511.06309v5 [cs.LG] 1 Sep 2016, pp. 1-13) teach an encoding of spatio-temporal video in an LSTM framework with differentiable memory in which the predictive output optical flow including the specifics of the LSTM sigmoid activation function, the functional role of hidden states, and an output (optical flow) that does not directly depend on the LSTM hidden states.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983. The examiner can normally be reached M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT LEWIS KULP/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124