DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 02/18/2021 have been fully considered but they are not persuasive. Regarding arguments on pages 8-9 of the Remarks, Examiner notes that even if Li is directed towards classification, Li still teaches the claimed limitations. Examiner notes that the Specification also appears to apply the invention towards classification, as in page 10 lines 12 and 18, and page 11 line 9. Further, Fig. 1 element 130 of Li shows the different classification results, which are interpreted as the sequence of output words. Fig. 2 of Li further shows the use of the topic modelling, the word embedding, and the LSTM model in determining the output sequence of words. Therefore, the claimed limitations are taught.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –




Claims 1, 8, and 15 are rejected under 35 U.S.C. 102(1) as being anticipated by Li et al. U.S. Patent Publication [2020/0184339].

With respect to Claim 1, Li discloses: 
A natural language processing system configured for receiving an input sequence ci of input words (vi, v2, ... vN) representing a first sequence of words in a natural language of a first text (110, Figure 1) and generating an output sequence of output words (ii, 2, ... i) representing a second sequence of words in a natural language of a second text (130, Figure 1; The left plate 110 in FIG. 1 shows a series of questions asked by users…On the right, diverse questions are classified into predefined categories 130, [0025]; Although figures and one or more embodiments described herein use question as an embodiment of an input,…the input may not be limited as question. Instead, it may be referred as other types of input, such as a statement, an expression, etc. Accordingly, the classification output may also other types of input classification, such as expression type, etc., besides question type, [0048])  and modeled by a multinominal topic model, wherein the multinominal topic model is extended by an incorporation of language structures using a deep contextualized Long-Short-Term Memory model (270, 260, Figure 2; For further interpretation, Li also discloses “Fig. 2 illustrates a full architecture of a TWEE (Topic Modeling, Word embedding and Entity Embedding) framework, according to embodiments of the present disclosure. The TWEE framework 200 is constructed incorporating three input components, namely, the topic sparse autoencoder 210, a word embedding 220 and an entity embedding 230. In one or more embodiments, the topic embedding 212, word embedding 222 and entity embeddings 232 are concatenated into a mixture embedding, which is fed into a classifier 240 for question type classification. In one or more embodiments, the classifier 240 may comprise a convolutional layer 245 with multiple filters to detect features at different positions, a max-pooling layer 250, an LSTM layer 260, a fully connected layer 270, and a prediction layer 280 to output a final question type 290” [0047]. Li discloses his own extension of a topic model that incorporates his choice of language structures using a deep contextualized LSTM layer that is contextualized with the word, topic, and entity embedding he chooses to deploy in TWEE). 

With respect to Claim 8, Li discloses: 
A computer-implemented method for processing natural language (Aspects of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed, [0096]), by receiving an input sequence c; of input words (vi, V2, ... VN) representing a first sequence of words in a natural language of a first text (110, Figure 1) and generating an output sequence of output words (i, 2,... iN) representing a second sequence of words in a natural language of a second text (130, Figure 1; The left plate 110 in FIG. 1 shows a series of questions asked by users…On the right, diverse questions are classified into predefined categories 130, [0025]; Although figures and one or more embodiments described herein use question as an embodiment of an input,… input may not be limited as question. Instead, it may be referred as other types of input, such as a statement, an expression, etc. Accordingly, the classification output may also other types of input classification, such as expression type, etc., besides question type, [0048]) and modeled by a multinominal topic model, comprising the steps: - extending the multinominal topic model by an incorporation of language structures, and - using a deep contextualized Long-Short-Term Memory model (270, 260, Figure 2; Li also discloses “Fig. 2 illustrates a full architecture of a TWEE (Topic Modeling, Word embedding and Entity Embedding) framework, according to embodiments of the present disclosure. The TWEE framework 200 is constructed incorporating three input components, namely, the topic sparse autoencoder 210, a word embedding 220 and an entity embedding 230. In one or more embodiments, the topic embedding 212, word embedding 222 and entity embeddings 232 are concatenated into a mixture embedding, which is fed into a classifier 240 for question type classification. In one or more embodiments, the classifier 240 may comprise a convolutional layer 245 with multiple filters to detect features at different positions, a max-pooling layer 250, an LSTM layer 260, a fully connected layer 270, and a prediction layer 280 to output a final question type 290” [0047]. Li discloses his own extension of a topic model that incorporates his choice of language structures using a deep contextualized LSTM layer that is contextualized with the word, topic, and entity embedding he chooses to deploy in TWEE).

With respect to Claim 15, Li discloses: 
A non-transitory computer-readable data storage medium comprising executable program code configured to, when executed, perform the method according to claim 8 (Furthermore, one skilled in the art will recognize that embodiments of the present disclosure, described below, may be implemented in a variety of ways, such as a process, an apparatus, a system, a device, or a method on a tangible computer-readable medium [0017]). 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 2-4, 6-7, 9-11 and 13-14 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. U.S. Patent Publication [2020/0184339] in view of Larochelle et al. Topic Modeling of Multimodal Data: an Autoregressive Approach.

	With respect to Claim 2, Li discloses:
	The natural language processing system of claim 1, wherein the multinominal topic model is a document neural autoregressive topic model (a unified neural network framework are presented by integrating Topic modeling, Word embedding, and Entity Embedding (TWEE) for question representation learning, [0027]) and the extended multinominal topic model is a contextualized document neural autoregressive topic model (In particular, embodiments of a Topic Sparse Autoencoder (TSAE) integrated with a probabilistic topic modeling algorithm are introduced… In addition, both words and entity related information are embedded into the network from different local viewpoints. Together with topic modeling, word embedding and entity embedding, embodiments of the proposed TWEE model not only explore information from local contexts of words and entities, but also incorporate global topical structures for a more comprehensive representation learning. [0027]).
	Li fails to disclose specifically using the multinominal topic model DocNADE and extending it with contextualized model. 
	In the same field of topic modeling, Larochelle discloses: wherein the multinominal topic model is a document neural autoregressive topic model, DocNADE, (Recently, a new type of topic model called the Document Neural Autoregressive Distribution Estimator (DocNADE) was proposed and demonstrated state-of-the-art performance for text document modeling, page 1) and the extended multinominal topic model is a contextualized document neural autoregressive topic model (Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increase the discriminative power of the hidden topic features by incorporating label information into the training objective of the model and show how to employ SupDocNADE to learn a joint representation from image visual words, annotation words and class label information, page 1). (Examiner interprets contextualized as extended). 
	Therefore it would have been obvious to one of ordinary skill, in the art at the time of effective filing, to modify the teachings of Li with Larochelle in order to have the ability to extend the DocNADE autoregressive topic model with contextualizing features. Larochelle describes the DocNADE topic model and extending it. Li describes extending a topic model by using a contextualized LSTM model, so it would have been obvious to combine the teachings of Larochelle with Li in order to “extend the DocNADE with contextualized features in order to increase the discriminative power of the hidden topic features” (page 1, Larochelle). 

	With respect to Claim 3, Li discloses: 
	The natural language processing system of claim 1, wherein the model (Step 505 and 510, Figure 5) is extended by the incorporation of distributed compositional priors (Step 515, Figure 5) for generating a contextualized model (Step 520, Figure 5; This section presents details of TWEE framework embodiments, which integrate topic modeling, word embedding and entity embedding for question representation learning. Firstly, a topic sparse autoencoder (TSAE) incorporates a probabilistic topic modeling algorithm into a sparse autoencoder. The global topical representations of questions are learned. Then, how word embeddings are learned from questions to capture the local context information is presented, [0050]).  For further interpretation, Li also discloses “Fig. 5 depicts a process for topic-related representation learning using the TSAE, according to embodiments of the present disclosure. In step 505, given an input (e.g. a question) comprising a plurality of words, a topic distribution over the input among one or more topics is generated by topic modeling. In one or more embodiments, the topic modeling in the TSAE comprises pre-trained probabilistic topic modeling algorithm. In one or mode embodiments, each topic is associated to one or more words from the input. In step 510, a topic distribution for words is obtained based on the topic distribution over the input (e.g. question). In step 515, the input is encoded, via an encoder, into a hidden representation, which may comprise one or more word embeddings. In step 520, the topic distribution for words is fed into the hidden representation to form a topic distribution over hidden state (or a topic distribution over the one or more word embeddings in the hidden state) so that the representation learning is more discriminative”, [0063]. 
	Li fails to disclose wherein the topic model is a ctx-DocNADE model extended by the incorporation of distributed compositional priors for generating a ctx-DocNADE model.
	In the same field of topic modeling, Larochelle discloses a topic model wherein DocNADE is extended. Specifically, Larochelle describes “a supervised extension of DocNADE (SupDocNADE), which incorporates the class label modality into training to learn more discriminative hidden features for classification. Then we describe how we exploit the spatial position information of the visual words. Finally, we describe how to jointly model the text annotation modality with SupDocNADE” (page 4, Larochelle). Larochelle does not extend the said DocNADE model so that it is contextualized by the incorporation of distributed compositional priors. Instead, it is extended to incorporate multimodal data that is applied to pictures. 
	Therefore it would have been obvious to one of ordinary skill, in the art at the time of effective filing, to modify the teachings of Li with Larochelle in order to have the ability to extend the DocNADE autoregressive topic model with contextualizing features. Larochelle describes the DocNADE topic model and extending it to apply to multimodal data. Li describes extending a topic model by using a contextualized model that is applicable for word documents due to its inclusion of a pre-trained model. The topic model is being extended in order to be better applicable to documents that consist of mainly context, so it would have been obvious to combine the teachings of Li with Larochelle to further “extend page 1, Larochelle).

	With respect to Claim 4, Li discloses: 
	The natural language processing system of claim 1, wherein the distributed composition priors are pre-trained word embeddings by LSTM-LM (610, Figure 6; FIG. 6 illustrates a learning process for word embeddings, according to one or more embodiment of the present disclosure. In FIG. 6, a group sparse autoencoder 620 and a skip-gram network 610 are used jointly to extract features from the input. For the skip-gram network 610, given an input (e.g., a question), a one-hot representations 614 of words in the input is transformed into low-dimensional word embeddings 612. In one or more embodiments, the prediction from each one-hot representation is context words of the word corresponding to the one-hot representation. The word embeddings 612 and the topic embeddings 622 generated by the group sparse autoencoder 610 are fed together into a CNN 630 for further feature mapping. Considering that the TSAE is a different representation involving topics and a count-based auto-encoder while the skip-gram embedding and CNN make use of contextual information, the TSAE and the combination of skip-gram embedding and CNN may be complementary to each other for improved performance, [0067]). Li uses word embeddings from another topic model, skip-gram, in order to contextualize the topic model.
Li fails to disclose wherein the contextualized model is ctx-DocNADE and that the pre-trained compositional priors are by an LSTM-LM model. 
In the same field of topic modeling, Larochelle discloses a topic model wherein DocNADE is extended. Specifically, Larochelle describes “a supervised extension of DocNADE (SupDocNADE), which incorporates the class label modality into training to learn more discriminative hidden features for classification. Then we describe how we exploit the spatial position information of the visual words. Finally, we describe how to jointly model the text annotation modality with SupDocNADE” (page 4, Larochelle). Larochelle further discloses that SupDocNADE “like all topic models…is trained to model the distribution of the bag of words representation of images and can extract a meaningful representation from it.
Therefore it would have been obvious to one of ordinary skill, in the art at the time, to modify the teachings of Li with Larochelle in order to have the ability to extend the DocNADE autoregressive topic model with contextualizing features, such as the pre-trained word embeddings. Larochelle describes the DocNADE topic model and extending it. Li describes extending a topic model by using a contextualized LSTM model, so it would have been obvious to combine the teachings of Li with Larochelle in order to “extend the DocNADE with contextualized features in order to increase the discriminative power of the hidden topic features” (page 1, Larochelle). 

	With respect to Claim 6: 
Li fails to disclose:
A conditional distribution for each word in the natural language processing system. 
The natural language processing system of claim 1, wherein the conditional distribution for each word v; is estimated by: 
    PNG
    media_image1.png
    22
    217
    media_image1.png
    Greyscale
  
In the same field of topic modeling, Larochelle discloses The natural language processing system of claim 1, wherein the conditional distribution for each word v; is estimated by: 
    PNG
    media_image1.png
    22
    217
    media_image1.png
    Greyscale
 (DocNADE models the joint probability of the visual words p(v)…and modeling instead each conditional…One possibility would be to model p(vi|v<i) with the following architecture: (See Equation 3), page 3, Larochelle).  
Therefore it would have been obvious to one of ordinary skill, in the art at the time of effective filing, to modify the teachings of Li with Larochelle with a general conditional distribution that is applicable to written word documents and not just pictorial documents. Larochelle states to “the main assumption made by DocNADE is in the form of the conditionals. Specifically, DocNADE assumes that each conditional can be modeled and learned by a feedforward neural network” (page 3, Larochelle). 

	With respect to Claim 7, Li discloses: 
	A model that is optimized to maximize the pseudo log likelihood. Li discloses that “skip-gram method may be applied to learn entity embeddings ee. By maximizing an average log probability, entity embeddings may be learned to help predict nearby entities, [0070]. (Examiner interprets log likelihood as average log probability). 
Li fails to disclose the natural language processing system of claim 1, wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, log p(v) ≈                         
                             
                            ∑
                        
                    D=i=1logp((vi|v<i). Li does not disclose using a docNADE or contextualized docNADE model that is optimized to maximize the pseudo log likelihood. 
In the same field of topic modeling, Larochelle discloses the natural language processing system of claim 1, wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, log p(v) ≈                         
                             
                            ∑
                        
                    D=i=1logp((vi|v<i). (Larochelle discusses using docNADE and an extended version of docNADE, supdocNADE. Larochelle discloses to “train the parameters θ = {W, V, b, c} of DocNADE, we simply optimize the average negative log-likelihood of the training set documents using stochastic gradient descent” (page 3, Larochelle). 
Therefore it would have been obvious to one of ordinary skill, in the art at the time of effective filing, to modify the teachings of Li with Larochelle with optimizing to maximize the pseudo log likelihood of a docNADE model the same it can be done for other topic models. Maximizing the log likelihood aids in the predicting done by the topic model. 

With respect to Claim 9, Li discloses:
a unified neural network framework are presented by integrating Topic modeling, Word embedding, and Entity Embedding (TWEE) for question representation learning, [0027]) and the extended multinominal topic model is a contextualized document neural autoregressive topic model (Specifically, we propose SupDocNADE, a supervised extension of DocNADE, that increase the discriminative power of the hidden topic features by incorporating label information into the training objective of the model and show how to employ SupDocNADE to learn a joint representation from image visual words, annotation words and class label information, page 1). (Examiner interprets contextualized as extended). (see rejection of claim 2). 

With respect to Claim 10, Li discloses:
The method of claim 8, wherein the model (Step 505 and 510, Figure 5) is extended by the incorporation of distributed compositional priors (Step 515, Figure 5) for generating a contextualized model (Step 520, Figure 5; This section presents details of TWEE framework embodiments, which integrate topic modeling, word embedding and entity embedding for question representation learning. Firstly, a topic sparse autoencoder (TSAE) incorporates a probabilistic topic modeling algorithm into a sparse autoencoder. The global topical representations of questions are learned. Then, how word embeddings are learned from questions to capture the local context information is presented, [0050]).  For further interpretation, Li also discloses “Fig. 5 depicts a process for topic-related representation learning using the TSAE, according to embodiments of the present disclosure. In step 505, given an input (e.g. a question) comprising a plurality of words, a topic distribution over the input among one or more topics is generated by topic modeling. In one or more embodiments, the topic modeling in the TSAE comprises pre-trained probabilistic topic modeling algorithm. In one or mode embodiments, each topic is associated to one or more words from the input. In step 510, a topic distribution for words is obtained based on the topic distribution over the input (e.g. question). In step 515, the input is encoded, via an encoder, into a hidden representation, which may comprise one or more word embeddings. In step 520, the topic distribution for words is fed into the hidden representation to form a topic distribution over hidden state (or a topic distribution over the one or more word embeddings in the hidden state) so that the representation learning is more discriminative”, [0063]. (see rejection of claim 3).  

With respect to Claim 11, Li discloses:
The method of claim 8, wherein the distributed composition priors are pre-trained word embeddings by LSTM-LM (610, Figure 6; FIG. 6 illustrates a learning process for word embeddings, according to one or more embodiment of the present disclosure. In FIG. 6, a group sparse autoencoder 620 and a skip-gram network 610 are used jointly to extract features from the input. For the skip-gram network 610, given an input (e.g., a question), a one-hot representations 614 of words in the input is transformed into low-dimensional word embeddings 612. In one or more embodiments, the prediction from each one-hot representation is context words of the word corresponding to the one-hot representation. The word embeddings 612 and the topic embeddings 622 generated by the group sparse autoencoder 610 are fed together into a CNN 630 for further feature mapping. Considering that the TSAE is a different representation involving topics and a count-based auto-encoder while the skip-gram embedding and CNN make use of contextual information, the TSAE and the combination of skip-gram embedding and CNN may be complementary to each other for improved performance, [0067]). Li uses word embeddings from another topic model, skip-gram, in order to contextualize the topic model. (see rejection of claim 4). 

With respect to Claim 13, Li fails to disclose:

    PNG
    media_image1.png
    22
    217
    media_image1.png
    Greyscale
 . 
Larochelle discloses The natural language processing system of claim 1, wherein the conditional distribution for each word v; is estimated by: 
    PNG
    media_image1.png
    22
    217
    media_image1.png
    Greyscale
 (DocNADE models the joint probability of the visual words p(v)…and modeling instead each conditional…One possibility would be to model p(vi|v<i) with the following architecture: (See Equation 3), page 3, Larochelle). (See rejection of claim 6).

With respect to Claim 14, Li discloses:
	The method of claim 8, wherein the model that is optimized to maximize the pseudo log likelihood. Li discloses that “skip-gram method may be applied to learn entity embeddings ee. By maximizing an average log probability, entity embeddings may be learned to help predict nearby entities, [0070]. (Examiner interprets log likelihood as average log probability). 
Li fails to disclose the natural language processing system of claim 1, wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, log p(v) ≈                         
                             
                            ∑
                        
                    D=i=1logp((vi|v<i). Li does not disclose using a docNADE or contextualized docNADE model that is optimized to maximize the pseudo log likelihood. 
Larochelle discloses the natural language processing system of claim 1, wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, log p(v) ≈                         
                             
                            ∑
                        
                    D=i=1logp((vi|v<i). (Larochelle discusses using docNADE and an extended version of docNADE, supdocNADE. Larochelle discloses to “train the parameters θ = {W, V, b, c} of DocNADE, we simply optimize the average negative log-likelihood of the training set documents using stochastic gradient descent” (page 3, Larochelle). (see rejection of claim 7).

Claims 5 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. U.S. Patent Publication [2020/0184339] in view of Larochelle et al. Document Neural Autoregressive Distribution Estimation (note that this is cited from the 5/02/2019 IDS), hereinafter Larochelle2.

With respect to Claim 5, Li discloses: 
The natural language processing system of claim 1, wherein the conditional probability P(wj|wi) may be defined as (see equation 7). 
Li fails to disclose the natural language processing system of claim 1, wherein a conditional probability of the word v, in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors: h°N (v<1) and hLM (ci), stemming from the DocNADE-based and LSTM-based components of ctx- DocNADE, respectively: 
    PNG
    media_image2.png
    13
    196
    media_image2.png
    Greyscale
 where h°N (v<1) is computed as:  
    PNG
    media_image3.png
    14
    158
    media_image3.png
    Greyscale
 and λ is the mixture weight of the LM component, which can be optimized during training and based on the validation set and the second term hLM is a context-dependent representation and output of an LSTM layer at position i-1 over input sequence c, trained to predict the next word v.  
In the same field of topic modeling, Larochelle2 discloses the natural language processing system of claim 1, wherein a conditional probability of the word v, in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors (The peculiarity if the DocNADE language model lies in the definition of this hidden layer, which now includes two terms: See Equation 19, page 12, Larochelle2): h°N (v<1) and hLM (ci), stemming from the DocNADE-based and LSTM-based components (The DocNADE language model maintains an unbounded cache and defines a proper, jointly trained solution to mitigate these two kinds (long and short) dependencies, page 12, Larochelle2) of ctx- DocNADE, respectively: 
    PNG
    media_image2.png
    13
    196
    media_image2.png
    Greyscale
 where h°N (v<1) is computed as: 
    PNG
    media_image3.png
    14
    158
    media_image3.png
    Greyscale
 (See Equation 19 and 20, page 12, Larochelle2) and λ is the mixture weight of the LM component, which can LM is a context-dependent representation and output of an LSTM layer at position i-1 (where i-1 is the number of words used to create hiDN(v<i), page 12, Larochelle2) over input sequence c, trained to predict the next word v (In this section, we propose a new model that extends DocNADE to mitigate the influence of both short and long term dependencies in a single model, which we refer to as the DocNADE language model or DocNADE-LM, page 11, Larochelle2).  
Therefore it would have been obvious to one of ordinary skill, in the art at the time, to modify the teachings of Li with Larochelle2 in order account for both long and short dependencies. Larochelle2 discloses that “in DocNADE…when assigning a probability to the next word in a sentence, it ignores the order in which the previously observed words appeared. Yet, this ordering of words conveys a lot of information regarding a syntactic role of the next word or the finer semantics within the sentence. In fact, most of that information is predictable from the last few words, which is why N-gram language models remain a dominating approach to language modeling” (page 11, Larochelle2).

With respect to Claim 12, Li discloses: 
The method of claim 8, wherein the conditional probability P(wj|wi) may be defined as (see equation 7). 
Li fails to disclose the natural language processing system of claim 1, wherein a conditional probability of the word v, in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors: h°N (v<1) and hLM (ci), stemming from the DocNADE-based and LSTM-based components of ctx- DocNADE, respectively: 
    PNG
    media_image2.png
    13
    196
    media_image2.png
    Greyscale
 where h°N (v<1) is computed as:  
    PNG
    media_image3.png
    14
    158
    media_image3.png
    Greyscale
 and λ is the mixture weight of the LM component, which can be optimized during training and based on the validation set and the second term hLM is a context-
In the same field of topic modeling, Larochelle2 discloses the natural language processing system of claim 1, wherein a conditional probability of the word v, in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors (The peculiarity if the DocNADE language model lies in the definition of this hidden layer, which now includes two terms: See Equation 19, page 12, Larochelle2): h°N (v<1) and hLM (ci), stemming from the DocNADE-based and LSTM-based components (The DocNADE language model maintains an unbounded cache and defines a proper, jointly trained solution to mitigate these two kinds (long and short) dependencies, page 12, Larochelle2) of ctx- DocNADE, respectively: 
    PNG
    media_image2.png
    13
    196
    media_image2.png
    Greyscale
 where h°N (v<1) is computed as: 
    PNG
    media_image3.png
    14
    158
    media_image3.png
    Greyscale
 (See Equation 19 and 20, page 12, Larochelle2) and λ is the mixture weight of the LM component, which can be optimized during training and based on the validation set and the second term hLM is a context-dependent representation and output of an LSTM layer at position i-1 (where i-1 is the number of words used to create hiDN(v<i), page 12, Larochelle2) over input sequence c, trained to predict the next word v (In this section, we propose a new model that extends DocNADE to mitigate the influence of both short and long term dependencies in a single model, which we refer to as the DocNADE language model or DocNADE-LM, page 11, Larochelle2). (see rejection of claim 5).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. US 2015/0262069 A1 para [0064], where topic models are used, and para [0072], where a multinominal classifier is used; US 2020/0242444 A1 para [0030], [0032] teach LSTMs for embedding, and word embeddings with topic models; US 2020/0218780 A1 para [0059] teaches using word embeddings and topic models for sentence embedding; US 2020/0019611 A1 para [0033] teaches LSTM Jin, M., Luo, X., Zhu, H., & Zhuo, H. H. (2018, June). Combining deep learning and topic modeling for review understanding in context-aware recommendation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers) (pp. 1605-1614).) teaches integrating LSTM and topic modeling (abstract).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685.  The examiner can normally be reached on 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available 






/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658