DETAILED ACTION
This action is in response to the claims filed 12/25/2021. Claims 1-20 are pending and have been examined. Claims 1, 3-10, 14, 16, 17, 20 are amended.

	
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 12/25/2021 have been fully considered but they are not persuasive. 
Regarding 35 U.S.C 101 rejection of claims 1-20
Applicant describes the complexity inherent in machine learning, and points to the complex multi-term objective function as evidence that the calculations cannot be performed in the mind with aid of pen and paper. Firstly the claims to not explicitly recite the equation 8 from the specification, the fact that the function has “iterative” summation terms, does not make updating terms of the equation “too complex”. Examiner notes that further details linking the updating of mathematical terms to the updating of a neural network model explicitly may be an additional element that could overcome the rejection.
Further, as pointed out in the rejection, the claim simply appends the abstract idea to a general purpose computer. The abstract idea is not considered an additional element, and cannot serve as the improvement or practical application nor can an abstract idea be indicative of “significantly more”.
Regarding 35 U.S.C 103 rejection of claims 1-20
Ruan et al.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 1-20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding Claim 1
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
Receiving input comprising a Dirichlet prior and a document…
Drawing from a word embedding matrix…
Drawing from a residual matrix…
Drawing from a Dirichlet prior…
Drawing from a topic embedding matrix…
Passing the at least on topic… 
Updating the word embedding matrix
The word embedding matrix is updated… sparsified with topics to reflect topic distribution
drawing a word from a vocabulary according to a probability… from the topic matrix
updating one or more topic representations by optimizing a likelihood function for topic…
outputting the updated word embedding matrix and the one or more updated topic representations.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “Receiving input comprising a Dirichlet prior and a document…”,  “Drawing from a word embedding matrix…”, “Drawing from a residual matrix…”, “Drawing from a Dirichlet prior…”, “Drawing from a topic embedding matrix…”, “Passing the at least on topic…”, “drawing a word from a vocabulary according to a probability… from the topic matrix”, “updating one or more topic representations by optimizing a likelihood function for topic”, “outputting the updated word embedding matrix…” (corresponds to evaluation and judgment) and “Updating the word embedding matrix”  and “The word embedding matrix is updated… sparsified with topics to reflect topic distribution” (corresponds to evaluation and judgement with aid of pen and paper).  “updating one or more topic representations by optimizing a likelihood function for topic”, “outputting the updated word embedding matrix…” (corresponds to mathematical calculation). Performing optimization 
	Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented” and “using an autoencoder”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 2
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
 wherein the word embedding matrix is initialized by pretrained word embeddings.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “initialized by pretrained word embeddings.” (corresponds to evaluation and judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 3
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a system for carrying out the method of claim 1. The Step 2A Prong One Analysis for claim 1 is applicable here since claim 3 carries out the method of claim 1 but for the recitation of additional elements “using the updated word embedding matrix and the one or more updated topic representations for document classification.” (insignificant extra-solution activity).
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the “using the updated word embedding matrix and the one or more updated topic representations for document classification.” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 4
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a system for carrying out the method of claim 1. The Step 2A Prong One Analysis for claim 1 is applicable here since claim 4 carries out the method of claim 1 but for the recitation of additional elements “wherein the autoencoder is a sparse autoencoder” (insignificant extra-solution activity).
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented” and “autoencoder is a sparse autoencoder”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 5
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
 wherein the word embedding in the word embedding matrix is generated
encoding word co-occurrence probabilities of the word with a feedforward propagation
the word co-occurrence probabilities are obtained by counting the number of times each context word occurs around its focus word divided by the frequency of the focus word
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “wherein the word embedding in the word embedding matrix is generated”, “the word co-occurrence probabilities are obtained by counting the number of times each context word occurs around its focus word divided by the frequency of the focus word” (corresponds to evaluation and judgement with aid of pen and paper) and “encoding word co-occurrence probabilities of the word with a feedforward propagation” (corresponds to a mathematical calculation).  As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the “using the sparse autoencoder” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 6
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
 wherein the word embedding matrix is initialized by pretrained word embeddings.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context “initialized by pretrained word embeddings.” (corresponds to evaluation and judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 7
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
 wherein the word loss function topic guidance term comprises a Kullback -Leibler (KL) divergence between a topic sparsity parameter for a topic and an average activation of the embeddings for the topic.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “word loss function comprises a Kullback -Leibler (KL) divergence...” (corresponds to a mathematical calculation).  As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 8
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
Receiving input comprising a Dirichlet prior and a document…
Constructing a word co-occurrence matrix…
Encoding… at least word co-occurrence… by a feedforward propagation
Decoding… the embedding representation of the input word…
training the sparse autoencoder by minimizing a word loss function…
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented” and “one or more processors”), the above limitations in the context of this claim encompass the following: “Receiving input comprising a Dirichlet prior and a document…”, “Constructing a word co-occurrence matrix …”, (corresponds to evaluation and judgment) and “Encoding… at least word co-occurrence… by a feedforward propagation”, “Decoding… the embedding representation of the input word…” and “training the sparse autoencoder by minimizing a word loss function” (corresponds to a mathematical calculation with aid of pen and paper). “training the sparse autoencoder by minimizing a word loss function…”(Mathematical calculation). While a training process may overcome the 101 rejection, the present claim limitation broadly describes training in terms of a minimization of a mathematical function, therefore the limitation corresponds to a mathematical computation. As such the claim recites an abstract idea.
	Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented” and “using an encoder”, “using a decoder”, “one or more processors”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. In addition, the claim recites additional element(s) “using a sparse autoencoder” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.
	
Regarding Claim 9
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a system for carrying out the method of claim 1. The Step 2A Prong One Analysis for claim 1 is applicable here since claim 3 carries out the method of claim 1 but for the recitation of additional elements “using the updated word embedding matrix and the one or more updated topic representations for document classification.” (insignificant extra-solution activity).
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. In addition, the claim recites additional “using the updated word embedding matrix and the one or more updated topic representations for document classification.” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 10
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery]. Each of the following limitations:
 wherein the word loss function topic guidance term comprises a Kullback -Leibler (KL) divergence between a topic sparsity parameter for a topic and an average activation of the embeddings for the topic.
computer-implemented”), the above limitations in the context of this claim encompass the following: “word loss function comprises a Kullback -Leibler (KL) divergence...” (corresponds to a mathematical calculation).  As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a 
Regarding Claim 11
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
 wherein the word co-occurrence matrix is extracted from a sequence of words in each document of the document set within a text window.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “wherein the word co-occurrence matrix is extracted from a sequence of words in each document of the document set within a text window.” (corresponds to an evaluation or judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 12
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
 wherein the text window is fixed and remains the same across documents.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context “wherein the text window is fixed and remains the same across documents.” (corresponds to an evaluation or judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 13
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
 wherein each word sequence has a focus word and its neighboring context words within a text window centered at the focus word.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “wherein each word sequence has a focus word and its neighboring context words within a text window centered at the focus word.” (corresponds to an evaluation or judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 14
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
 the topic information is updated using the updated word embeddings…
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “the topic information is updated using the updated word embeddings.…” (corresponds to a mathematical calculation with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 15
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for generating word embedding]. Each of the following limitations:
 wherein the topic information is drawn from a topic matrix based on a mixing topic proportion, the mixing topic proportion is generated from the Dirichlet prior.
computer-implemented”), the above limitations in the context of this claim encompass the following: “wherein the topic information is drawn from a topic matrix based on a mixing topic proportion, the mixing topic proportion is generated from the Dirichlet prior.” (corresponds to an evaluation or judgement with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a 

Regarding Claim 16
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery and word embedding. Each of the following limitations:
Receiving input comprising a Dirichlet prior and a document…
initializing at least a topic matrix…
generating a mixing topic proportion representing relative proportions among topics…
with a word embedding matrix fixed, updating topics…
encoding… sparsified with the updated topics… by a feedforward propagation
calculating an overall objective function…
updating the weight matrix… with backpropagation
updating the word embedding matrix …
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented” and “one or more processors”), the above limitations in the context of this claim encompass the following: “Receiving input comprising a Dirichlet prior and a document…”, “initializing at least a topic matrix…”, “generating a mixing topic proportion representing relative proportions among topics…”, “with a word embedding matrix fixed, updating topics…”, “updating the word embedding matrix …” (corresponds to evaluation and judgment) and “encoding… sparsified with the updating topics… by a feedforward propagation”, “calculating an overall objection function…” and “updating the weight matrix… with backpropagation” (corresponds to a mathematical calculation with aid of pen and paper). As such the claim recites an abstract idea.
	Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, “using an autoencoder” and “one or more processors”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. In addition, the claim recites additional element(s) “using a sparse autoencoder” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a 

Regarding Claim 17
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a system for carrying out the method of claim 1. The Step 2A Prong One Analysis for claim 1 is applicable here since claim 3 carries out the method of claim 1 but for the recitation of additional elements “wherein the topic loss function is a likelihood function of the document set, the topic matrix, the residual matrix, the word embedding matrix, and the topic embedding matrix.” (insignificant extra-solution activity).
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. In addition, the claim recites additional element(s) “wherein the topic loss function is a likelihood function of the document set, the topic matrix, the residual matrix, the word embedding matrix, and the topic embedding matrix..” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 18
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery and word embedding. Each of the following limitations:
 wherein the word loss function comprises a term of Kullback-Leibler (KL) divergence between…
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “wherein the word loss function comprises a term of Kullback-Leibler (KL) divergence between…” (corresponds to a mathematical calculation with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 19
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a computer-implemented method for mutual learning with topic discovery and word embedding. Each of the following limitations:
 Decoding…the embedding representation of the input word back to a reconstructed representation.
as drafted, is a process that, under its broadest reasonable interpretation, covers mental processes (concepts performed in the human mind (including an observation, evaluation, judgment, opinion)) but for the recitation of generic computer components. For example, but for the generic computer components language (“computer-implemented”), the above limitations in the context of this claim encompass the following: “Decoding…the embedding representation of the input word back to a reconstructed representation” (corresponds to a mathematical calculation with aid of pen and paper). As such the claim recites an abstract idea.
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these steps are recited at a high-level of generality (i.e., as a generic computer component performing a generic computer function) such that it amounts no more than mere instructions to apply the exception using a generic computer component. In addition, the claim recites additional element(s) “using an autoencoder” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h). Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do 
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Regarding Claim 20
Step 1 Analysis: The claim is directed to a computer method, which is directed to a process, one of the statutory categories.
Step 2A Prong One Analysis: The claim recites a system for carrying out the method of claim 1. The Step 2A Prong One Analysis for claim 1 is applicable here since claim 3 carries out the method of claim 1 but for the recitation of additional elements “using the updated word embedding matrix and the one or more updated topic representations for document classification.” (insignificant extra-solution activity).
Step 2A Prong Two Analysis: The judicial exception in not integrated into a practical application. In particular, the claim recites additional element(s) that are mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea. See MPEP 2106.05(f). The additional element(s) of “computer-implemented”, as drafted, are reciting generic computer components. The generic computer components in these “using the updated word embedding matrix and the one or more updated topic representations for document classification.” that only generally link the use of the judicial exception to a particular technological environment or field of use. See MPEP 2106.05(h).Accordingly, the additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. The claim is directed to an abstract idea.
Step 2B Analysis: The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional element of using generic computer components to perform the abstract idea amounts to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Therefore, the claim is not patent eligible.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	
Claims 1, 3 and 4 is rejected under 35 U.S.C. 103 as being unpatentable over Li et al “Generative Topic Embedding: a Continuous Representation of Documents” hereinafter Li. Further in view of Ruan et al. “Context-Aware Phrase Representation for Statistical Machine Translation” hereinafter Ruan.

Regarding Claim 1
Li teaches, A computer-implemented method for mutual learning with topic discovery and word embedding using one or more processors to cause steps to be performed comprising: (Figure 1 and pg 8 Experimental settings ¶01 “The learned topic embeddings were combined to form the whole topic set, where redundant null topics in different categories were removed, leaving us with 281 topics for 20News and 111 topics for Reuters… using the scikit-learn library” Examiner notes that the learning of topic embeddings that is described, utilizes word embeddings as shown in figure 1. A Scikit-learn is a computer tool set for machine learning, which necessitates a processor)  receiving input comprising a Dirichlet prior and a document set having at least one document; (pg 4 Section 5 step 2 “generative process is as follows…
    PNG
    media_image1.png
    67
    325
    media_image1.png
    Greyscale
” step 2 entails drawing a proportion from the Dirichlet, for each document. In order to do this the set of all documents D must be received and an associated Dirchlet prior as well.)  for each word in a set of words from the document set: drawing, from a word embedding matrix, a word embedding for the word, (pg 4 Section 5 step 2 and Figure 1 and Table 1 “the word embedding…and residual… are drawn from respective Gaussians.” 
    PNG
    media_image2.png
    368
    360
    media_image2.png
    Greyscale
As shown in the figure, for each word in a document, represented by the interior box, a word embedding Vsi is an element drawn from a distribution. The table indicates that V is a word embedding matrix.) drawing, from a residual matrix, (pg 4 Section 5 step 2 and Figure 1 Table 1 “the word embedding…and residual… are drawn from respective Gaussians.” Section 3 ¶02 “Each context word wi−j and the focus word wi comprise a bigram wi−j , wi” Similarly the residual matrix represents the bigram residuals in the matrix A, notated in the table.) residuals for each word co-occurrence corresponding to the word, Section 3 ¶02 “Each context word wi−j and the focus word wi comprise a bigram wi−j , wi” each residual or bigram corresponds to a particular focus word, wi. The co-occurrence being the bigram set <wij, wi>) each residual presenting nonlinear or noisy interaction between the word and another word in each document; (Section 4 ¶03 “Here a_wi,wj [residual between two words wi and wj] is referred as the bigram residual, indicating [presenting] the non-linear part” As stated previously the residual as described is drawn for each document.)  drawing, from a topic embedding matrix, one or more topic embeddings corresponding to the word; (Section 5 “For the k-th topic, draw a topic embedding uniformly from a hyperball of radius… Unif(Bγ); [topic embedding matrix]… For the j-th word:… Draw topic assignment zij from the categorical distribution Cat(φi)” the topic assignment is drawn for each jth word in a document, the k-th topic corresponds to the word based on the drawn mixing proportion)  for each document in the document set: 3Appl. No.16/355,622 Atty. Docket No. 28888-2284 (BN190118USN3) Office Action Date 28 July 2021Response Datedrawing, from the Dirichlet prior, a mixing topic proportion (Section 5 step 2 “For each document di [from the set of documents D]: …(a) Draw the mixing proportions φi [topic proportions] from the Dirichlet prior Dir(α);”)  representing relative proportions among topics for each document; (Section 3 ¶04 “Each topic t_ik has a document-specific prior probability [relative proportion] to be assigned to a word, denoted as φik = P(k|di). The vector φi = (φi1, · · · , φiK) is referred to as the mixing proportions of these topics in document di” For each topic among the topics, t_k, where the probabilities indicate a relative proportions, for example a probability for a topic of 2x it twice as likely as another topic with probability x)  drawing at least one topic from a topic matrix for a j-th word in each document based on the mixing topic proportion, j is a positive integer number; ;  (Section 5 step 2 “(b) For the j-th word: i. Draw topic assignment zij from [based on] the categorical distribution Cat(φi ) [mixing proportion];” given that j is an index describing the set of words, j can never be negative. ) and drawing a word from a vocabulary according to a probability of the word ( pg 4 Section 5 step 2(b)(ii) “Draw word wij from S according to P(wij | wi,j−c:wi,j−1, zij , di).”  S is the vocabulary as defined in table 1 on pg 3, the probability mathematically describes drawing a word wij from S according to a probability of the given word. zij is the topic drawn from the topic matrix, as described above.) and updating one or more topic representations by optimizing a likelihood function for topic, the likelihood function for topic is a function of the document set, the topic matrix, the residual matrix, the word embedding matrix, and the topic embedding matrix; and outputting the updated word embedding matrix and the one or more updated topic representations.  ( pg 4 Section 5.1 ¶02 “Then the complete-data likelihood of the whole corpus is:… 
    PNG
    media_image3.png
    22
    195
    media_image3.png
    Greyscale
pg 5 Section 6.1 “the learning objective is to find the embeddings V , the topics T , and the word-topic and document-topic distributions p(Zi, φi |di , A,V , T ).” The learning objectective is to update the parameters of the likelihood function, corresponding to updating and outputting the topic representations and word embedding matrix. The likelihood function includes:  the document set D, the topic matrix Z, the residual matrix A, the word embedding matrix V, and the topic embedding matrix T.)
However Li does not explicitly teach, the word embedding matrix is updated using an autoencoder sparsified with topics to reflect topic distribution of words, the autoencoder uses a word cost function that comprises a topic guidance term, the topic guidance term uses topic distributions of words in the document set to update word embeddings in the word embedding matrix;
Ruan however, when addressing context aware phrase embedding using autoencoder neural networks teaches, the word embedding matrix is updated ( pg 144 ¶02 “R(θ)  is the regularization term involving the following parameter sets:1 (1)   θLw : the word embedding matrices;” pg 144 ¶03 “We apply a similar co-training style algorithm as [19] to train the model parameters.” the parameters of the model are updated, the set of parameters includes the word embedding matrix) using an autoencoder ( pg 138 ¶03 “we name our model as Topically-informed Bilingually-constrained Recursive Auto-encoders (TBRAE).” The model is an auto encoder later referred to as RAE)
sparsified with topics to reflect topic distribution of words, the autoencoder uses a word cost function that comprises a topic guidance term, the topic guidance term uses topic distributions of words in the document set to update word embeddings in the word embedding matrix; ( pg 144 ¶02 “Besides, we impose the word-topic semantic constraint, mentioned in Sect. 3.4, on words in two languages. Thus, the final objective over the training set D becomes… 
    PNG
    media_image4.png
    104
    430
    media_image4.png
    Greyscale
” in section 3.4 the art describes the function Ewt() as a topic guidance term: pg 143 ¶02 “Thus, the semantic correlation between words w and   w′  in the topic space can actually be determined by their conditional distributions   p^(z|w)  and   p^(z|w′)” for a given topic conditional distributions, where z is a sampled topic which sparsifies the word embedding vectors. This function or term is used in the final objective function to inform parameter updates during training.)
	It would have been obvious for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the joint topic-word embedding system of Li with the topic or context aware word embedding autoencoder system of Ruan.
	One of ordinary skill in the arts would have been motivated to combine these references
Ruan improves the embedding model of Li by utilizing an bipartite autoencoder “which results in better phrase embedding” that incorporates context information as document-level distributions in order to “augment the presentation capability of phrase representation.” (Ruan Introduction pg 138 ¶02-¶03)

Regarding Claim 3
Li/Ruan teaches claim 1
	Further Li teaches, using the updated word embedding matrix and the one or more updated topic representations for document classification. ( Conclusion “Our method has potential applications in various scenarios, such as document retrieval, classification, clustering and summarization” pg 8 ¶02 “TopicVec used the word embeddings trained using PSDVec on a March 2015 Wikipedia snapshot. It contains the most frequent 180,000 words. The dimensionality of word embeddings and topic embeddings was 500” TopicVec is trained to perform document classification. As mapped previously in the rejection of claim 1 training involves updating the parameters includes Z, the topic representations, and V, the embedding matrix. Therefore, training the model for document classification uses these parameters for document classification.)

Regarding Claim 4
Li/Ruan teaches claim 1
	Further Li teaches, wherein the encoder autoencoder is a sparse autoencoder (pg 7 Section 3.4 ¶03 “Then, we choose the Kullback-Leibler divergence…Here we introduce the weight  λw that is defined as the frequency of w to distinguish the effects of different words.” The auto-encoder is trained with a KL divergence term which defines the sparsity of the weight parameters, thus the auto-encoder is a sparse auto-encoder.)




Claims 2 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan, and further in view of Amiri et al. “Learning Text Pair Similarity with Context-sensitive Autoencoders” hereinafter Amiri.

Regarding Claim 2
Li/Ruan teaches claim 1
Li/Ruan does not explicitly teach, wherein the word embedding matrix is initialized by pretrained word embeddings.
	However, Amiri when addressing issues related to autoencoders for encoding word embeddings teaches, wherein the word embedding matrix is initialized by pretrained word embeddings. ( Section 4.2 “We use pre-trained word vectors from GloVe (Pennington et al., 2014)….we only use GloVe as initialization…” Figure 2 and the accompanying caption “Pretraining properly initializes a stack of context-sensitive denoising autoencoders” The word vectors that are used to create the word embedding matrix, hc,  are pretrained to initialize the model, as shown in the figure in DAE-0.)
Amiri to the disclosed invention of Li/Ruan.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model that “encodes input text into context-sensitive representations and uses them to compute similarity between text pairs.” The model “outperforms the state-of-the-art models in two semantic retrieval tasks and a contextual word similarity task” (Amiri Conclusion)

Claims 5 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan, and further in view of Chen et al “KATE: K-Competitive Autoencoder for Text” hereinafter Chen.

Li/Ruan teaches claim 5
Li/Ruan teaches claim 1
Li/Ruan does not explicitly teach, wherein the word embedding in the word embedding matrix is generated, using the sparse autoencoder, by encoding word co-occurrence probabilities of the word with a feedforward propagation, the word co-occurrence probabilities are obtained by counting the number of times each context word occurs around its focus word divided by the frequency of the focus word. 
However, Chen when addressing issues related to autoencoders for learning efficient representation of text documents teaches, wherein the word embedding in the word (Section 3.1 ¶02 and Algorithm 1 “we represent each input text document as a log-normalized word count vector x ∈ R d where each dimension is represented as… 
    PNG
    media_image5.png
    62
    323
    media_image5.png
    Greyscale
 where V is the vocabulary and ni is the count of word i in that document” the words in a particular document can be formulated as a word co-occurrence or count matrix/vector as described. In particular xi, the normalized probability, is created by counting the number of times the word occurs in a document, corresponding to each context word around the focus word, and dividing this value by the total frequency in the vocabulary. Then in algorithm 1 the resulting matrix, x, is used to generate the hidden state of the autoencoder corresponding to the word embedding matrix. See Algorithm 1 step 2 “Feedforward step: z = tanh(W x + b).” The autoencoder is sparse because it is designed for text processing which can be sparse.)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate an autoencoder that uses a co-occurrence matrix to generate a hidden state matrix for an encoder model as taught by Chen to the disclosed invention of Li/Ruan
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model well suited for learning across “tasks such as document classification, multi-label classification, regression and document retrieval, KATE clearly outperforms competing methods or obtains close to the best results. It is very encouraging to note (Chen Conclusion)

Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan/Chen Further in view of Amiri et al. “Learning Text Pair Similarity with Context-sensitive Autoencoders” hereinafter Amiri.

Regarding Claim 6
Li/Ruan/Chen teaches claim 5
Li/Ruan/Chen does not explicitly teach, wherein the word embedding matrix is initialized by pretrained word embeddings.
	However, Amiri when addressing issues related to autoencoders for encoding word embeddings teaches, wherein the word embedding matrix is initialized by pretrained word embeddings. ( Section 4.2 “We use pre-trained word vectors from GloVe (Pennington et al., 2014)….we only use GloVe as initialization…” Figure 2 and the accompanying caption “Pretraining properly initializes a stack of context-sensitive denoising autoencoders” The word vectors that are used to create the word embedding matrix, hc,  are pretrained to initialize the model, as shown in the figure in DAE-0.)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a joint autoencoder that uses a hidden embedding layer, corresponding to a matrix updated by the autoencoder, to represent words informed by contextual. In which each word has a corresponding topic vector to be input into the encoder as taught by Amiri to the disclosed invention of Li/Ruan/Chen.
(Amiri Conclusion)

Claims 7 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan, and further in view of Chao et al. “Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization” hereinafter Chao.

Regarding Claim 7
Li/Ruan teaches claim 4
	Further Li teaches, the topic guidance term comprises a Kullback-Leibler (KL) divergence (pg 8 “Ewt(∗;θ)  denotes the error functions of the word-topic semantic constraints for two languages” as stated previously the topic guidance term corresponds to the term in the final objective function depicted as Ewt. Ewt is the KL divergence between two terms. Pg 7 Section 3.4 ¶03 “Then, we choose the Kullback-Leibler divergence   Ewt(w;θ)  to encourage   p(∗|w)  to be close to   p^(∗|w) : 
    PNG
    media_image6.png
    38
    291
    media_image6.png
    Greyscale
”)  
Li/Ruan does not explicitly teach, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. 
However Chao when addressing topic embedding with an Autoencoder based topic modeling framework teaches, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. (pg 4-5 Section 3.2 ¶04-06 and eq 21 “
    PNG
    media_image7.png
    66
    375
    media_image7.png
    Greyscale
Therefore, we consider a regularized AE with a fixed sparsity target ρ and enforce the average of each embedding topic to approximate ρ by minimizing the KL divergence. Formally, let  denote the average output of hidden unit j over m” the first term p is the sparsity parameter for the topic modeling autoencoder, and the second term p^ is the average activation of a topic embedding, because it is the activation of a hidden unit of the topic autoecoder.)
It would have been objvous for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the topic modeling autoencoder of Li/Ruan with the topic modeling sparse autoencoder of Chao.
	One of ordinary skill in the arts would have been motivated to combine these references
because both references disclose sparse autoencoders for topic embedding. Chao improves the embedding model of Li/Ruan by providing  “an easier interpretation about the meaning of embedding topics, and such sparse representation may help improve the discriminative performance” (Chao pg 5 ¶01)

Claims 8, 9, 14-17, 19-20 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan/Amiri, and further in view of Chao et al. “Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization” hereinafter Chao.

Regarding Claim 8
	Li teaches, A computer-implemented method for generating word embedding using one or more processors to cause steps to be performed comprising (pg 3 Section 4 ¶04 “an efficient and effective word embedding algorithm [to generate word embedding” pg 8 ¶01 “… using the scikit-learn library” the word embedding algorithm is performed in part using a ML library, thus requiring an accompanying processor.) receiving input comprising a Dirichlet prior and a document set having at least one document; (pg 4 Section 5 step 2 “generative process is as follows…
    PNG
    media_image1.png
    67
    325
    media_image1.png
    Greyscale
” step 2 entails drawing a proportion from the Dirichlet, for each document. In order to do this, the set of all documents D must be received and an associated Dirchlet.) 
Li does not appear to explicitly teach, for each document:5Appl. No.16/355,622 Atty. Docket No. 28888-2284 (BN190118USN3)Office Action Date 28 July 2021Response Dateconstructing a word co-occurrence matrix comprising a plurality of word co- occurrences probabilities respectively corresponding to a plurality of word- pairs; encoding, using a sparse autoencoder sparsified with topic information, at least word co-occurrence of each input word in each document to a word embedding representation by a feedforward propagation; decoding, using the sparsse autoencoder, the embedding representation of the input word back to a reconstructed representation;
However Li does not explicitly teach, training the sparse autoencoder to update the word embeddings by minimizing a word loss function that comprises a topic guidance term 
Ruan however, when addressing context aware phrase embedding using autoencoder neural networks teaches, training the sparse autoencoder to update the word embeddings by minimizing a word loss function that comprises a topic guidance term involving topic distribution of words to encapsulate topic information such that the updated word embeddings reflect topic distribution of words. ( pg 144 ¶02 “R(θ)  is the regularization term involving the following parameter sets:1 (1)   θLw : the word embedding matrices;” pg 144 ¶03 “We apply a similar co-training style algorithm as [19] to train the model parameters.” the parameters of the model are updated, the set of parameters includes the word embedding matrix) ( pg 138 ¶03 “we name our model as Topically-informed Bilingually-constrained Recursive Auto-encoders (TBRAE).” The model is an auto encoder later referred to as RAE) ( pg 144 ¶02 “Besides, we impose the word-topic semantic constraint, mentioned in Sect. 3.4, on words in two languages. Thus, the final objective over the training set D becomes… 
    PNG
    media_image4.png
    104
    430
    media_image4.png
    Greyscale
” in section 3.4 the art describes the function Ewt() as a topic guidance term: pg 143 ¶02 “Thus, the semantic correlation between words w and   w′  in the topic space can actually be determined by their conditional distributions   p^(z|w)  and   p^(z|w′)” for a given topic conditional distributions, where z is a sampled topic which sparsifies the word embedding vectors. This function or term is used in the final objective function to inform parameter updates during training. Because the objective function optimizes the word embeddings and the conditional distributions, the word embeddings reflect the topic information.)
	It would have been objvous for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the joint topic-word embedding system of Li with the topic or context aware word embedding system of Ruan.
	One of ordinary skill in the arts would have been motivated to combine these references
because both references disclose encoder models for topic sensitive word embedding. Ruan improves the embedding model of Li by utilizing an bipartite autoencoder “which results in better phrase embedding” that incorporates context information as document-level distributions in order to “augment the presentation capability of phrase representation.” (Ruan Introduction pg 138 ¶02-¶03)
However Li/Ruan does not explicitly teach, for each document:5Appl. No.16/355,622 Atty. Docket No. 28888-2284 (BN190118USN3)Office Action Date 28 July 2021Response Dateconstructing a word co-occurrence matrix comprising a plurality of word co- occurrences probabilities respectively corresponding to a plurality of word- pairs; encoding, using a sparse autoencoder sparsified with topic information, at least word co-occurrence of each input word in each document to a word embedding representation by a feedforward propagation; decoding, using the sparsse autoencoder, the embedding representation of the input word back to a reconstructed representation; 
 However, Amiri when addressing issues related to autoencoders trained with both topic and words teaches, encoding, using a sparse autoencoder sparsified with topic information, (Section 2.3 “where each column in C is a sparse representation of an input over all topics and will be used as global context information in our model…” The matrix C represents the sparsified topic information associated with the words in a document or word sequence. Which is used in part in the context sensitive model or autoencoder) decoding, using the sparsse autoencoder, the embedding representation of the input word back to a reconstructed representation; ( Figure 2 pg 3 
    PNG
    media_image8.png
    365
    365
    media_image8.png
    Greyscale
 and pg 2 section 2.2 ¶4 “The loss function must then compute the loss between the input pair (x, hc) and its reconstruction (xˆ, hˆc)”As shown in the figure, the encoder creates and embedding or hidden representation that is reconstructed by the decoder as x̂)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a joint autoencoder that takes as input, global context or topic information and local input words as taught by Amiri to the disclosed invention of Li/Ruan.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model that “encodes input text into context-sensitive representations and uses them to compute similarity between text pairs.” The model “outperforms the state-of-the-art (Amiri Conclusion)
Li/Ruan/Amiri do not explicitly teach, for each document:5Appl. No.16/355,622 Atty. Docket No. 28888-2284 (BN190118USN3)Office Action Date 28 July 2021Response Dateconstructing a word co-occurrence matrix comprising a plurality of word co- occurrences probabilities respectively corresponding to a plurality of word- pairs; at least word co-occurrence of each input word in each document to a word embedding representation by a feedforward propagation; 
However, Chen when addressing issues related to autoencoders for learning efficient representation of text documents teaches, for each document:5Appl. No.16/355,622 Atty. Docket No. 28888-2284 (BN190118USN3)Office Action Date 28 July 2021Response Dateconstructing a word co-occurrence matrix comprising a plurality of word co- occurrences probabilities respectively corresponding to a plurality of word- pairs; at least word co-occurrence of each input word in each document to a word embedding representation by a feedforward propagation; (Section 3.1 ¶02 and Algorithm 1 “we represent each input text document [for each document] as a log-normalized word count vector [co-occurrence matrix] x ∈ R d where each dimension is represented as… 
    PNG
    media_image5.png
    62
    323
    media_image5.png
    Greyscale
” the words in a particular document can be formulated as a word co-occurrence or count matrix/vector as described. In particular xi, the normalized probability, is created by counting the number of times the word occurs in a document, corresponding to each context word around the focus word, and dividing this value by the total frequency in the vocabulary. Then in algorithm 1 the resulting matrix, x, is used to generate the hidden state of the autoencoder corresponding to the word embedding matrix. See Algorithm 1 step 2 “Feedforward step: z = tanh(W x + b).” The autoencoder is sparse because it is designed for text processing which can be sparse.)
Chen to the disclosed invention of Li/Amiri.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model well suited for learning across “tasks such as document classification, multi-label classification, regression and document retrieval, KATE clearly outperforms competing methods or obtains close to the best results. It is very encouraging to note that KATE is also able to learn semantically meaningful representations of words, documents and topics” (Chen Conclusion)

Regarding Claim 9
Li/Ruan/Amiri/Chen teaches claim 8
	Further Li teaches, using the updated word embedding matrix and the one or more updated topic representations for document classification. ( Conclusion “Our method has potential applications in various scenarios, such as document retrieval, classification, clustering and summarization” pg 8 ¶02 “TopicVec used the word embeddings trained using PSDVec on a March 2015 Wikipedia snapshot. It contains the most frequent 180,000 words. The dimensionality of word embeddings and topic embeddings was 500” TopicVec is trained to perform document classification. As mapped previously in the rejection of claim 1 training involves updating the parameters includes Z, the topic representations, and V, the embedding matrix. Therefore, training the model for document classification uses these parameters for document classification.)

Regarding Claim 14
Li/Ruan/Amiri/Chen teaches claim 8
	Further Ruan teaches, wherein the topic information is updated using the updated word embeddings. ( pg 144 ¶02 “Besides, we impose the word-topic semantic constraint, mentioned in Sect. 3.4, on words in two languages. Thus, the final objective over the training set D becomes… 
    PNG
    media_image4.png
    104
    430
    media_image4.png
    Greyscale
” in section 3.4 the art describes the function Ewt() as a topic guidance term: pg 143 ¶02 “Thus, the semantic correlation between words w and   w′  in the topic space can actually be determined by their conditional distributions   p^(z|w)  and   p^(z|w′)” performing the training to minimize the final objective updates the word embedding parameters as described in the rejection of claim 8. The topic information reflects the updated parameter settings due to the training step.)

Regarding Claim 15
	Li/Ruan/Amiri/Chen teach claim 8
	Further Li teaches, wherein the topic information is drawn from a topic matrix based on a mixing topic proportion, the mixing topic proportion is generated from the Dirichlet prior. (Section 3 ¶04 “Each topic tik has a document-specific prior probability to be assigned to a word, denoted as φik = P(k|di). The vector φi = (φi1, · · · , φiK) is referred to as the mixing proportions of these topics in document di .” Section 5 “For each document di: …(a) Draw the mixing proportions φi from the Dirichlet prior Dir(α);”  the topic t_ik is selected from the topic matrix, T, depicted in Figure 1. The probability of being drawn is based on the mixing proportion which is drawn from the Dirichlet prior for each document.)

Regarding Claim 16
	Li teaches, A computer-implemented method for mutual learning with topic discovery and word embedding using one or more processors to cause steps to be performed comprising: (Figure 1 and pg 8 Experimental settings ¶01 “The learned topic embeddings were combined to form the whole topic set, where redundant null topics in different categories were removed, leaving us with 281 topics for 20News and 111 topics for Reuters… using the scikit-learn library” Examiner notes that learning of topic embeddings is described, that utilizes word embeddings as shown in figure 1. A Scikit-learn is a computer tool set for machine learning.) receiving input comprising a Dirichlet prior, and a document set having at least one document; (pg 4 Section 5 step 2 “generative process is as follows…
    PNG
    media_image1.png
    67
    325
    media_image1.png
    Greyscale
” step 2 entails drawing a proportion from the Dirichlet, for each document. In order to do this the set of all documents D must be received and an associated Dirchlet.) initializing at least a topic matrix, a topic embedding matrix, a residual matrix (Section 3 “Each document has K candidate topics, arranged
in the matrix form [topic matrix]” Section 5 step 1 “. For the k-th topic, draw a topic embedding uniformly from a hyperball of radius γ… . tk ∼ Unif(Bγ)” the state space of B corresponds to the embedding matrix. Section 5.1 “Given the embeddings V , the bigram residuals A [residual matrix] , the topics T” ) generating a mixing topic proportion (Section 3 ¶04 “Each topic tik has a document-specific prior probability to be assigned to a word, denoted as φik = P(k|di). The vector φi = (φi1, · · · , φiK) is referred to as the mixing proportions of these topics in document di .” Section 5 “For each document di: …(a) Draw the mixing proportions φi from the Dirichlet prior Dir(α);”  the topic t_ik is selected from the topic matrix, T, depicted in Figure 1. The probability of being drawn representing relative proportions of the topics is based on the mixing proportion which is drawn from the Dirichlet prior for each document and the selected topic embedding as described in step 1 of section 5.) with a word embedding matrix fixed, updating topics in the topic matrix based on at least the mixing topic proportion; (Section 5 step 2b “Draw topic assignment zij from the categorical distribution Cat(φi)” for a given word a topic zij is drawn, the set of topics selected for all the words in the document corresponds to a vector Z, or a topic matrix, these values are selected based on the mixing topic proportion and Li does not teach updating the word embedding matrix according to the topic assignment.)
	Li does not explicitly teach, a weight matrix for a sparsified autoencoder; 7Appl. No.16/355,622 encoding, using the sparsified autoencoder sparsified with the updated topics, word co-occurrences in the word co-occurrence matrix to corresponding word embeddings by a feedforward propagation; calculating an overall objective function combined from a topic loss function and a word loss function, the word loss function comprises a topic guidance term that uses topic distributions of words in the document set to update word embeddings in the word embedding matrix; updating the weight matrix for the sparsified autoencoder with backpropagation; and updating the word embedding matrix using the sparsified autoencoder with the updated the weight matrix.  

Ruan however, when addressing context aware phrase embedding using autoencoder neural networks teaches, a weight matrix for a sparsified autoencoder ( pg 144 ¶02 “R(θ)  is the regularization term involving the following parameter sets:1 (1)   θLw : the word embedding matrices;” pg 144 ¶03 “We apply a similar co-training style algorithm as [19] to train the model parameters.” the parameters of the model are updated, the set of parameters includes the word embedding matrix) using an autoencoder ( pg 138 ¶03 “we name our model as Topically-informed Bilingually-constrained Recursive Auto-encoders (TBRAE).” The model is an auto encoder later referred to as RAE) sparsified with the updated topics, calculating an overall objective function combined from a topic loss function and a word loss function, the word loss function comprises a topic guidance term that uses topic distributions of words in the document set to update word embeddings in the word embedding matrix; and updating the word embedding matrix using the sparsified autoencoder with the updated the weight matrix. ( pg 144 ¶02 “Besides, we impose the word-topic semantic constraint, mentioned in Sect. 3.4, on words in two languages. Thus, the final objective over the training set D becomes… 
    PNG
    media_image4.png
    104
    430
    media_image4.png
    Greyscale
” in section 3.4 the art describes the function Ewt() as a topic guidance term: pg 143 ¶02 “Thus, the semantic correlation between words w and   w′  in the topic space can actually be determined by their conditional distributions   p^(z|w)  and   p^(z|w′)” for a given topic conditional distributions, where z is a sampled topic which sparsifies the word embedding vectors. This function or term is used in the final objective function to inform parameter updates during training.)
	It would have been objvous for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the joint topic-word embedding system of Li with the topic or context aware word embedding system of Ruan.
	One of ordinary skill in the arts would have been motivated to combine these references
because both references disclose encoder models for topic sensitive word embedding. Ruan improves the embedding model of Li by utilizing an bipartite autoencoder “which results in better phrase embedding” that incorporates context information as document-level distributions in order to “augment the presentation capability of phrase representation.” (Ruan Introduction pg 138 ¶02-¶03)
Li/Ruan does not explicitly teach, encoding, using the sparsified autoencoder word co-occurrences in the word co-occurrence matrix to corresponding word embeddings by a feedforward propagation; updating the weight matrix for the sparsified autoencoder with backpropagation
	However, Amiri when addressing issues related to autoencoders trained with both topic and words teaches, encoding, using the sparsified autoencoder (As discussed previously the topic matrix D is updated using the sparse coding algorithm, which does not encorporate the word embedding matrix, h, the updated topic matrix is used in part to generate the matrix C used in the autoencoder disclosed.) 
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a joint autoencoder that takes as input, global context or topic information and local input words as taught by Amiri to the disclosed invention of Li/Ruan.
(Amiri Conclusion)
Li/Ruan/Amiri does not explicitly teach, encoding…word co-occurrences in the word co-occurrence matrix to corresponding word embeddings by a feedforward propagation; updating the weight matrix for the sparsified autoencoder with backpropagation
However, Chen when addressing issues related to autoencoders for learning efficient representation of text documents teaches, encoding…word co-occurrences in the word co-occurrence matrix to corresponding word embeddings by a feedforward propagation; updating the weight matrix for the sparsified autoencoder with backpropagation (Section 3.1 ¶02 and Algorithm 1 “we represent each input text document [for each document] as a log-normalized word count vector [co-occurrence matrix] x ∈ R d where each dimension is represented as… 
    PNG
    media_image5.png
    62
    323
    media_image5.png
    Greyscale
” the words in particular document can be formulated as a word co-occurrence matrix/vector as described, then in algorithm 1 the resulting matrix, x, is used to generate the hidden state of the autoencoder corresponding to the word embedding matrix via an encoding step or forward propogation. See Algorithm 1 step 2 “Feedforward step: z = tanh(W x + b).” During backpropogation (step 5 of Algorithm 1) the weight matrix is updated.)
	It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate an autoencoder that uses a co-occurrence matrix to Chen to the disclosed invention of Li/Ruan/Amiri.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model well suited for learning across “tasks such as document classification, multi-label classification, regression and document retrieval, KATE clearly outperforms competing methods or obtains close to the best results. It is very encouraging to note that KATE is also able to learn semantically meaningful representations of words, documents and topics” (Chen Conclusion)

Regarding Claim 17
	Li/Ruan/Amiri/Chen teach claim 16
	Further Li teaches, wherein the topic loss function is a likelihood function of the document set, the topic matrix, the residual matrix, the word embedding matrix, and the topic embedding matrix. ( pg 4 Section 5.1 ¶02 “Then the complete-data likelihood of the whole corpus is:… 
    PNG
    media_image3.png
    22
    195
    media_image3.png
    Greyscale
pg 5 Section 6.1 “the learning objective is to find the embeddings V , the topics T , and the word-topic and document-topic distributions p(Zi, φi |di , A,V , T ).” The learning objectective is to update the parameters of the likelihood function, corresponding to updating and outputting the topic representations and word embedding matrix. The likelihood function includes:  the document set D, the topic matrix Z, the residual matrix A, the word embedding matrix V, and the topic embedding matrix T.)

Regarding Claim 19
	Li/Ruan/Amiri/Chen teach claim 16
Further Amiri teaches, decoding, using the sparsified autoencoder, the embedding representation of the input word back to a reconstructed representation.   ( Figure 2 pg 3 
    PNG
    media_image8.png
    365
    365
    media_image8.png
    Greyscale
 and pg 2 section 2.2 ¶4 “The loss function must then compute the loss between the input pair (x, hc) and its reconstruction (xˆ, hˆc)”As shown in the figure, the encoder creates and embedding or hidden representation that is reconstructed by the decoder as x̂. The decoder is part of the sparsified autoencoder)


Regarding Claim 20
	Li/Ruan/Amiri/Chen teach claim 16
	Further Li teaches, further comprising: using the updated topics and the updated word embedding matrix for document classification. ( Conclusion “Our method has potential applications in various scenarios, such as document retrieval, classification, clustering and summarization” pg 8 ¶02 “TopicVec used the word embeddings trained using PSDVec on a March 2015 Wikipedia snapshot. It contains the most frequent 180,000 words. The dimensionality of word embeddings and topic embeddings was 500” TopicVec is trained to perform document classification. As mapped previously in the rejection of claim 1 training involves updating the parameters includes Z, the topic representations, and V, the embedding matrix. Therefore, training the model for document classification uses these parameters for document classification.)

Claims 10 and 18 is rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan/Amiri/Chen, and further in view of Chao et al. “Locally weighted embedding topic modeling by markov random walk structure approximation and sparse regularization” hereinafter Chao.

Regarding Claim 10
Li/Ruan/Amiri/Chen teaches claim 9
	Further Li teaches, topic guidance term comprises a Kullback-Leibler (KL) divergence (pg 8 “Ewt(∗;θ)  denotes the error functions of the word-topic semantic constraints for two languages” as stated previously the topic guidance term corresponds to the term in the final objective function depicted as Ewt. Ewt is the KL divergence between two terms. Pg 7 Section 3.4 ¶03 “Then, we choose the Kullback-Leibler divergence   Ewt(w;θ)  to encourage   p(∗|w)  to be close to   p^(∗|w) : 
    PNG
    media_image6.png
    38
    291
    media_image6.png
    Greyscale
”)  
Li/Ruan/Amiri/Chen does not explicitly teach, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. 
However Chao when addressing topic embedding with an Autoencoder based topic modeling framework teaches, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. . (pg 4-5 Section 3.2 ¶04-06 and eq 21 “
    PNG
    media_image7.png
    66
    375
    media_image7.png
    Greyscale
Therefore, we consider a regularized AE with a fixed sparsity target ρ and enforce the average of each embedding topic to approximate ρ by minimizing the KL divergence. Formally, let  denote the average output of hidden unit j over m” the first term p is the sparsity parameter for the topic modeling autoencoder, and the second term p^ is the average activation of a topic embedding, because it is the activation of a hidden unit of the topic autoecoder.)
It would have been objvous for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the topic modeling autoencoder of Li/Ruan/Amiri/Chen with the topic modeling sparse autoencoder of Chao.
	One of ordinary skill in the arts would have been motivated to combine these references
because both references disclose sparse autoencoders for topic embedding. Chao improves the embedding model of Li/Ruan/Amiri/Chen by providing  “an easier interpretation about the meaning of embedding topics, and such sparse representation may help improve the discriminative performance” (Chao pg 5 ¶01)

Regarding Claim 18
Li/Ruan/Amiri/Chen teaches claim 16
	Further Li teaches, topic guidance term comprises a Kullback-Leibler (KL) divergence (pg 8 “Ewt(∗;θ)  denotes the error functions of the word-topic semantic constraints for two languages” as stated previously the topic guidance term corresponds to the term in the final objective function depicted as Ewt. Ewt is the KL divergence between two terms. Pg 7 Section 3.4 ¶03 “Then, we choose the Kullback-Leibler divergence   Ewt(w;θ)  to encourage   p(∗|w)  to be close to   p^(∗|w) : 
    PNG
    media_image6.png
    38
    291
    media_image6.png
    Greyscale
”)  
Li/Ruan/Amiri/Chen does not explicitly teach, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. 
However Chao when addressing topic embedding with an Autoencoder based topic modeling framework teaches, between a topic sparsity parameter for a topic and an average activation of the embedding’s for the topic. . (pg 4-5 Section 3.2 ¶04-06 and eq 21 “
    PNG
    media_image7.png
    66
    375
    media_image7.png
    Greyscale
Therefore, we consider a regularized AE with a fixed sparsity target ρ and enforce the average of each embedding topic to approximate ρ by minimizing the KL divergence. Formally, let  denote the average output of hidden unit j over m” the first term p is the sparsity parameter for the topic modeling autoencoder, and the second term p^ is the average activation of a topic embedding, because it is the activation of a hidden unit of the topic autoecoder.)
It would have been objvous for one of ordinary skill in the arts before the effective filling data of the claimed invention to combine the topic modeling autoencoder of Li/Ruan/Amiri/Chen with the topic modeling sparse autoencoder of Chao.
	One of ordinary skill in the arts would have been motivated to combine these references
because both references disclose sparse autoencoders for topic embedding. Chao improves the embedding model of Li/Ruan/Amiri/Chen by providing  “an easier interpretation about the (Chao pg 5 ¶01)

Claims 11-13 are rejected under 35 U.S.C. 103 as being unpatentable over Li/Ruan/Amiri/Chen. Further in view of Xun et al. “Collaboratively Improving Topic Discovery and Word Embeddings by Coordinating Global and Local Contexts” hereinafter Xun.

Regarding Claim 11
	Li/Ruan/Amiri/Chen teach Claim 8
	Li/Ruan/Amiri/Chen do not explicitly teach, the word co-occurrence matrix is extracted from a sequence of words in each document of the document set within a text window.
	Xun however when addressing methods for embedding the local context of a document into a matrix in order to be processing by a topic discovery model teaches, the word co-occurrence matrix is extracted from a sequence of words in each document of the document set within a text window. (pg 3 ¶02 “Given a text corpus, its document-level global context information is encoded in the document-word matrix D and its local context information is encoded in the word co-occurrence matrix W. The word co-occurrence matrixW is constructed from small fixed-sized text intervals [window representing sequence of words] in the documents [each document]”)
It would have been obvious for one of ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a method for constructing and incorporating a co-Xun to the disclosed invention of Li/Ruan/Amiri/Chen.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a model that is built on “both the global and the local context enables it to make use of more sufficient information” and to jointly train the embedded space of both topics and words to be used in “experiments on the real-world datasets validate the effectiveness” (Conclusion Xun)

Regarding Claim 12
	Li/Amiri/Chen/Xun teach Claim 11 
	Further Xun teaches, wherein the text window is fixed and remains the same across documents. (pg 3 ¶02 “The word co-occurrence matrixW is constructed from small fixed-sized text intervals [window representing sequence of words] in the documents [each document]” the text interval being fixed in the documents means that it “remains” fixed in the documents.)

Regarding Claim 13
	Li/Amiri/Chen/Xun teach Claim 11 
	Further Xun teaches, each word sequence has a focus word and its neighboring context words within a text window centered at the focus word. (pg 3 ¶02 “Each text interval is composed of a focus word and its neighboring context words falling in a fixed-sized window centered at the focus word”)


Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.




/J.R.G./Examiner, Art Unit 2122                                                                                                                                                                                                        
/NICHOLAS KLICOS/Primary Examiner, Art Unit 2145