DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
This action is in response to submission filed 16 December 2021 for application 16/458,230. Claims 1, 2-4, and 6 have been amended. Claim 5 has been canceled. Currently claims 1-4, 6, and 7 are pending and have been examined.
The objection to claim 1 has been withdrawn in view of the amendments made.
The rejection of Claims 1-7 under 35 U.S.C. 112(b) has been withdrawn in view of the amendments made.
The rejection of Claim 6 under 35 U.S.C. 101 (because the claimed invention did not fall within one of the four statutory categories) has been withdrawn in view of the amendments made.

Response to Arguments
Applicant’s arguments, see pages 8-17, with respect to the 35 USC §101 rejection of claims 1-7 have been fully considered but are not persuasive.
Specifically, applicant argues, see page 13, that none of the limitations recite abstract ideas but simply involve them and so the result of Step 2A, Prong 1 is No and the claims are patent eligible. Examiner respectfully disagrees. The analysis under prong 1 is not whether the claims involve the use of abstract idea. It is whether the claim recites an abstract idea. The office action identified the limitations that are directed to an abstract idea (as shown below), particularly mathematical concepts and the k E RHxK, where k indicates a number of a source Sk, k > 1, of the latent topic feature, H indicates a dimension of the latent topic and K indicates a vocabulary size; transferring knowledge to the target T by GVT via learning meaningful latent topic features guided by relevant latent topic features Zk of the topic KB, comprising the sub-step: - extending a loss function L(v) of the probabilistic or neural autoregressive topic model for the document v of the target T, wherein the loss function L(v) is a negative log-likelihood of joint probabilities p(vi I v<) of each word vi in the autoregressive NN, wherein probabilities p(vi I v<) for each word vi are based on preceding words v<i, with a regularisation term comprising weighted relevant latent topic features Zk to form an extended loss function reg (v); and - minimising the extended loss function £reg (v) to determine a minimal overall loss, determining, with the probabilistic or neural autoregressive topic model of the target T, a topic of the document v based on the determined minimal overall loss of the extended loss function, under the broadest reasonable interpretation, recite mathematical relationships and calculations of extending a loss function of the probabilistic or neural autoregressive topic model, wherein the loss function L(v) is a negative log-likelihood of joint probabilities. So, the claim recites judicial exceptions and it falls within the “Mathematical concepts” grouping of abstract ideas.
Further, applicant argues on pages 14-16 that the claims are still eligible under Step 2A Prong 2, because the claims include elements that integrate into a practical application. Examiner respectfully disagrees. In Step 2A, prong 2 of the analysis, the limitation, preparing a pre-trained Knowledge Base, KB, is considered to be an additional element and as recited represent insignificant extra-solution  (data gathering) because it is a mere nominal or tangential addition to the claim. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity, for instance the step of printing a menu that was generated through an abstract process in Apple, Inc. v. Ameranth, Inc., 842 F.3d 1229, 1241-42 (Fed. Cir. 2016) and the mere generic presentation of collected and analyzed data in Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354 (Fed. Cir. 2016). In the same step, the limitation of, by [the] at least one central processing unit of a data processing system, is considered to be another additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method using at least one central processing unit of a data processing system) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application.
Specifically, Applicant argues, see page 14, that the limitation of “determining, by the at least one central processing unit of the data processing system with the probabilistic or neural autoregressive topic model of the target T, a topic of the document v based on the determined minimal overall loss of the extended loss function” as amended is an additional practical application. Examiner respectfully disagrees because “determining, with the probabilistic or neural autoregressive topic model of the target T, a topic of the document v based on the determined minimal overall loss of the extended loss function” is an abstract idea as shown above in Step 2A, prong 1 and “by the at least one central processing unit of the data processing system” is considered to be another additional element and it does not integrate the abstract idea into a practical application as shown above under Step 2A, prong 2.
Additionally, Applicant argues, see page 15, that the claimed methods recite improvements not only to the broad field of machine learning, but also the more specific technical field of autoregressive 
Furthermore, applicant argues on pages 16 and 17 that the claims contain something significantly more than the abstract idea as is required by part 2B, and that it has not been shown that the specific steps are well known in the art, and thus the output if Step 2B is Yes, and the claims are patent eligible. Examiner respectfully disagrees. Page 8 of the Non-Final rejection has provided Berkheimer analysis for claim 1. To elaborate the last step of the 101 analysis, in Step 2B, the recitation of the “preparing a pre-trained knowledge base…” limitation is recited at a high level of generality, and, as disclosed in Gupta et al [Page 6, Section: Experimental Setup], is also well-understood, routine and conventional. This limitation is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The additional element, by [the] at least one central processing unit of a data processing system, does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, the method using at least one central processing unit of a data processing system, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. Even when considered in combination, the additional element represents judicial exceptions and an 
Applicant further argues on page 17 that the burden of showing the Berkheimer analysis was not met in the previous rejection. Examiner respectfully disagrees. Page 8 of the Non-Final rejection has provided Berkheimer analysis for claim 1. Please note that Berkheimer analysis applies only to limitations that have been identified as additional elements that are insignificant extra solution activities that are well-understood, routine, and conventional that are specified at a high level of generality. See MPEP 2106.05(d). Specifically, in step 2B of the 101 analysis, Berkheimer analysis has been provided for the limitations that were identified as additional elements and insignificant extra solution activities as explained above and also shown below in the detailed analysis.
Applicant argues, on page 18 that claims 1-7 are not obvious and unpatentable over the cited combination of references. Examiner respectfully disagrees.
Specifically, on page 19, applicant argues that nothing in Chen would suggest using multiple sources. Examiner respectfully disagrees. Chen teaches multiple sources at least in Page 2. Column 2, Paragraph 2 of Page 2 states that, in summary, this paper makes the following contributions: 1. It proposes a novel approach to exploit text collections from many domains to learn prior knowledge to guide model inference in order to generate more coherent topics, which under the broadest reasonable interpretation, examiner is interpreting as multiple sources, noting that text collections from many domains corresponds to more than one source.
Further, applicant argues on pages 19 and 20 that Gupta fails to teach extending a loss function with a regularization term. Examiner respectfully disagrees. Gupta teaches extending a loss function with a regularization term on Page 4. Section 2.1 on Page 4 states, DocNADE models the joint distribution p(v) of all words vi by decomposing it as p(v) = piDi=1 p(vi|v<i), where each autoregressive conditional p(vi|v<i) for the word observation vi is computed using the preceding observations v<i ∈ {v1; ∈ 2 RKxH is a weight matrix connecting hidden to output, e ∈ RH and b 2 RK are bias vectors, W ∈ RHxK is a word representation matrix in which a column W:;vi is a vector representation of the word vi in the vocabulary, and H is the number of hidden units (topics). The log-likelihood of any document v of any arbitrary length is given by 
    PNG
    media_image1.png
    32
    306
    media_image1.png
    Greyscale
, which under the broadest reasonable interpretation, examiner is interpreting as extending a loss function with a regularization term, noting that Log likelihood corresponds to loss function, Ldn(v) with summation equation corresponds to the extended loss function, regularization term corresponds to the right side of equation (1) where U corresponds to the topic features weighted by h. Hence Gupta teaches extending a loss function with a regularization term.
Applicant argues on page 20, “a NADE approach from another source which does not relate to the methods taught by Gupta. The methods of Gupta itself do not relate to a negative log-likelihood of joint probabilities”. Examiner respectfully disagrees because the DocNADE methods of Gupta are built upon NADE of Larochelle, so they definitely relate. Further, minimizing the negative log-likelihood is equivalent to maximizing the log-likelihood. 
Still further, applicant argues on page 20 that Larochelle fails to teach minimizing a loss function across multiple sources to determine a minimal overall loss. Examiner respectfully disagrees because  minimizing the loss function of Gupta/Larochelle, when the documents are from multiple sources (as taught by Chen) teaches this limitation.
Lastly, applicant argues that dependent claims are allowable. Examiner respectfully disagrees because the claims 2-4 and 6-7 depend from the independent claim 1 and the combination of cited references teach every element of the amended claims as shown below.

	

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 

Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
Claim 7 in this application that uses the word “means” is being interpreted under 35 U.S.C. 112(f) but support for structure was found in specification on Page 13 (lines 30-35) at least.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 6 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 recites the limitation "the computer program” in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Also claim 6 recites that it is dependent on canceled claim 5.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 - 7 are rejected under 35 U.S.C. 101 because the claimed invention is directed towards abstract ideas without significantly more. 
Regarding claim 1, according to the first step (Step 1) of the 101 analysis, claim 1 is directed to a method (process) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). 
In the next step (Step 2A, prong 1) of the analysis, the limitations of a computer-implemented method of Neural Topic Modelling, NTM, in an autoregressive Neural Network, NN, using Global-View Transfer, GVT, for a probabilistic or neural auto- regressive topic model of a target T given a document v of words vi, i = 1.. D, comprising the steps: - topic Knowledge Base, KB, of latent topic features Zk E RHxK, where k indicates a number of a source Sk, k > 1, of the latent topic feature, H indicates a dimension of the latent topic and K indicates a vocabulary size; transferring knowledge to the target T by GVT via learning meaningful latent topic features k of the topic KB, comprising the sub-step: - extending a loss function L(v) of the probabilistic or neural autoregressive topic model for the document v of the target T, wherein the loss function L(v) is a negative log-likelihood of joint probabilities p(vi I v<) of each word vi in the autoregressive NN, wherein probabilities p(vi I v<) for each word vi are based on preceding words v<i, with a regularisation term comprising weighted relevant latent topic features Zk to form an extended loss function reg (v); and - minimising the extended loss function £reg (v) to determine a minimal overall loss, determining, with the probabilistic or neural autoregressive topic model of the target T, a topic of the document v based on the determined minimal overall loss of the extended loss function, under the broadest reasonable interpretation, recite mathematical relationships and calculations. So, the claim recites judicial exceptions and it falls within the “Mathematical concepts” grouping of abstract ideas. 
In the next step (Step 2A, prong 2) of the analysis, the limitation, preparing a pre-trained Knowledge Base, KB, is considered to be an additional element and as recited represent insignificant extra-solution activity (data gathering) because it is a mere nominal or tangential addition to the claim. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity, for instance the step of printing a menu that was generated through an abstract process in Apple, Inc. v. Ameranth, Inc., 842 F.3d 1229, 1241-42 (Fed. Cir. 2016) and the mere generic presentation of collected and analyzed data in Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354 (Fed. Cir. 2016). In the same step, the limitation of, by [the] at least one central processing unit of a data processing system, is considered to be another additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a method using at least one central processing unit of a data processing system) that it represents no more than mere instructions to apply the 
In the last step (Step 2B) of the analysis, the recitation of the “preparing a pre-trained knowledge base…” limitation is recited at a high level of generality, and, as disclosed in Gupta et al [Page 6, Section: Experimental Setup], is also well-understood, routine and conventional. This limitation is recited at a high level of generality and amounts to extra-solution activity of receiving data i.e. pre-solution activity of gathering data for use in the claimed process.  The courts have found limitations directed to obtaining information electronically, recited at a high level of generality, to be well-understood, routine, and conventional (see MPEP 2106.05(d)(II), “receiving or transmitting data over a network”, "electronic record keeping," and "storing and retrieving information in memory"). These limitations therefore remain insignificant extra-solution activity even upon reconsideration, and do not amount to significantly more. The additional element, by [the] at least one central processing unit of a data processing system, does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, the method using at least one central processing unit of a data processing system, is at best the equivalent of merely adding the words “apply it” to the judicial exception. See MPEP 2106.05(f). Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. Even when considered in combination, the additional element represents judicial exceptions and an insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 2, according to Step 2A, prong 1 of the analysis, the limitation of wherein the probabilistic or neural autoregressive topic model is a Document Neural Autoregressive Distribution Estimator (DocNADE) architecture, under the broadest reasonable interpretation, recites mathematical relationships and calculations. So, the claim recites judicial exceptions and it falls within the “Mathematical concepts” grouping of abstract ideas.

Regarding claim 3, according to Step 2A, prong 1 of the analysis, the limitations of using Multi-View Transfer, MVT, by additionally using Local-View Transfer, LVT, further comprising: word embeddings KB of word embeddings Ek E RExK where E indicates the dimension of the word embedding; -transferring knowledge to the target T by LVT via learning meaningful word embeddings guided by relevant word embeddings Ek of the word embeddings KB, comprising the sub-step: - extending a term for calculating pre-activations a of the probabilistic or neural autoregressive topic model of the target T, which pre-activations a control an activation of the autoregressive NN for the preceding words v<i in the probabilities p(vi I v<) of each word vi, with weighted relevant latent word embed- dings Ek to form an extended pre-activation aext, under the broadest reasonable interpretation, recites mathematical relationships and calculations. So, the claim recites judicial exceptions and it falls within the “Mathematical concepts” grouping of abstract ideas. 
In the next step (Step 2A, prong 2) of the analysis, the limitation, preparing a pre-trained KB, is considered to be an additional element and as recited represent insignificant extra-solution activity (data gathering) because it is a mere nominal or tangential addition to the claim. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity, for instance the step of printing a menu that was generated through an abstract process in Apple, Inc. v. Ameranth, Inc., 842 F.3d 1229, 1241-42 (Fed. Cir. Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354 (Fed. Cir. 2016). 
In the last step (Step 2B) of the analysis, the recitation of the “preparing a pre-trained KB…” limitation is recited at a high level of generality, and, as disclosed in Gupta et al [Page 6, Section: Experimental Setup], is also well-understood, routine and conventional. This limitation therefore remains insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, the additional element represents judicial exceptions and an insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 4, according to Step 2A, prong 1 of the analysis, the limitation of using Multi-Source Transfer, MST, and Zk E JRHxK of the topic KB and/or Ek E RExK of the word embeddings KB, and Sk, k > 1, under the broadest reasonable interpretation, recites mathematical relationships and calculations. So, the claim recites judicial exceptions and it falls within the “Mathematical concepts” grouping of abstract ideas. 
In the next step (Step 2A, prong 2) of the analysis, the limitation of, wherein the latent topic features and/or word embeddings stem from more than one source, is considered to be an additional element and as recited represents insignificant extra-solution activity (data gathering) because it is a mere nominal or tangential addition to the claim. See MPEP 2106.05(g), discussing limitations that the Federal Circuit has considered to be insignificant extra-solution activity, for instance the step of printing a menu that was generated through an abstract process in Apple, Inc. v. Ameranth, Inc., 842 F.3d 1229, 1241-42 (Fed. Cir. 2016) and the mere generic presentation of collected and analyzed data in Electric Power Group, LLC v. Alstom S.A., 830 F.3d 1350, 1354 (Fed. Cir. 2016). 


Regarding claim 6, according to the first step (Step 1) of the 101 analysis, claim 6 is directed to a non-transitory computer-readable medium system (manufacture) and falls within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). 
In the next step (Step 2A, prong 2) of the analysis, the limitation of a non-transitory computer-readable medium having stored thereon the computer program according to claim 5, is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a non-transitory computer-readable medium having stored thereon the computer program according to claim 5) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a non-transitory computer-readable medium having stored thereon the computer program according to claim 5 is at best the equivalent of merely adding the words “apply it” to the judicial exception. Mere instructions to apply an exception cannot provide an inventive concept and does not amount to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 7, according to Step 2A, prong 2 of the analysis, the limitation of a data processing system comprising means for carrying out the steps of the method according to claim 1, is considered to be an additional element and it does not integrate the abstract idea into a practical application because the additional element is recited so generically (no details whatsoever are provided other than that it is a data processing system comprising means for carrying out the steps of the method ac- cording to claim 1) that it represents no more than mere instructions to apply the judicial exception on a computer. As discussed in MPEP 2106.05(f), mere instructions to implement an abstract idea on a computer as a tool to perform an abstract idea is not indicative of integration into a practical application. 
In the last step (Step 2B) of the analysis, the additional element does not amount to significantly more than the judicial exceptions. As explained with respect to Step 2A Prong Two, a data processing system comprising means for carrying out the steps of the method ac- cording to claim 1 is at best the equivalent of merely adding the words “apply it” to the judicial exception. Mere instructions to apply an exception cannot provide an inventive concept and 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner 
Claims 1-7 are rejected under 35 U.S.C. 103 as being unpatentable over Gupta et al (textTOvec: DEEP CONTEXTUALIZED NEURAL AUTOREGRESSIVE MODELS OF LANGUAGE WITH DISTRIBUTED COMPOSITIONAL PRIOR, 2018) in view of Chen et al (Topic Modeling using Topics from Many Domains, Lifelong Learning and Big Data, 2014) and further in view Larochelle et al (A Neural Autoregressive Topic Model, 2012).
Regarding claim 1
Gupta teaches: A computer-implemented method of Neural Topic Modelling, NTM, in an autoregressive Neural Network, NN, using Global-View Transfer, GVT, for a probabilistic or neural autoregressive topic model of a target T given a document v of words vi, i = 1.. D, comprising the steps ([Page 1, Abstract] In this work, we incorporate language structure by combining a neural autoregressive topic model (TM) with a LSTM based language model (LSTM-LM) in a single probabilistic framework. [Page 2] Figure 1 (left) shows Global view. [Page 4, Section 2.2] Similar to DocNADE, ctx-DocNADE models each document v as a sequence of multinomial observations. Let [x1; x2;...; xN] be a sequence of N words in a given document. [Page 3, Section: Contribution 2] short texts (corresponding to target T)): 
preparing, by at least one central processing unit of the data processing system, a pre-trained topic Knowledge Base, KB ([Page 6, Section: Experimental Setup] we perform a pre-training. [Page 3, Section: Contribution 2] we use pre-trained word embeddings via LSTM-LM to supplement the multinomial topic model (i.e., DocNADE) in learning latent topic and textual representations on a smaller corpus and/or short texts. [Page 8, Paragraph 2] the introduction of both pre-trained embeddings and language/contextual information. [Page 12, Section B.2 EXPERIMENTAL SETUP AND HYPERPARAMETERS FOR IR TASK] number of training passes. Note: Experimental setup section B mentioning training in several places corresponds to the fact that it was done on a computer with a central processing unit), of latent topic features Zk E RHxK, ..., H indicates a dimension of the latent topic and K indicates a vocabulary size ([Page 4, Section 2.1, Paragraph 2] a vocabulary of size K. [Page 4, Section 2.1, Paragraph 3, below eq (1)] U ∈ 2 RKxH is a weight matrix connecting hidden to output, e ∈ RH and b 2 RK are bias vectors, W ∈ RHxK is a word representation matrix in which a column W:;vi is a vector representation of the word vi in the vocabulary, and H is the number of hidden units (topics)); 
transferring, by the at least one central processing unit of the data processing system, knowledge to the target T by GVT via learning meaningful latent topic features guided by relevant latent topic features Zk of the topic KB, comprising the sub-step ([Page 2] Figure 1 (left) shows Global view. [Page 2, Section: Contribution 1] learning complementary semantics by combining joint word and latent topic learning. [Page 3, Section: Contribution 2] Taken together, we combine the advantages of complementary learning and external knowledge, and couple topic- and language models with pre-trained word embeddings to): 
extending, by the at least one central processing unit of the data processing system, a loss function L(v) of the probabilistic or neural autoregressive topic model for the document v of the target T, wherein the loss function L(v) is a negative log-likelihood of joint probabilities p(vi I v<) of each word vi in the autoregressive NN, wherein probabilities p(vi I v<) for each word vi are based on preceding words v<t, with a regularisation term comprising weighted relevant latent topic features Zk to form an ([Page 1, Abstract] combining a neural autoregressive topic model (TM) with a LSTM based language model (LSTM-LM) in a single probabilistic framework. [page 4, Paragraph 1] this leads to tractable gradients of the data negative log-likelihood. [Page 4, Section 2.1] DocNADE models the joint distribution p(v) of all words vi by decomposing it as p(v) = piDi=1 p(vi|v<i), where each autoregressive conditional p(vi|v<i) for the word observation vi is computed using the preceding observations v<i ∈ {v1; ...; vi-1} in a feed-forward neural network for i 2 {1; ...D}. Equation (1). where, g( ) is an activation function, U ∈ 2 RKxH is a weight matrix connecting hidden to output, e ∈ RH and b 2 RK are bias vectors, W ∈ RHxK is a word representation matrix in which a column W:;vi is a vector representation of the word vi in the vocabulary, and H is the number of hidden units (topics). The log-likelihood of any document v of any arbitrary length is given by 
    PNG
    media_image1.png
    32
    306
    media_image1.png
    Greyscale
. Note: Log likelihood corresponds to loss function. Ldn(v) with summation equation corresponds to the extended loss function, regularization term corresponds to the right side of equation (1) where U corresponds to the topic features weighted by h);
determining, by the at least one central processing unit of the data processing system with the probabilistic or neural autoregressive topic model of the target T, a topic of the document v ([Page 8, section 3.4] Table 7 shows a topic extracted using 20NS dataset that could be interpreted as computers, which are (sub)categories in the data, confirming that meaningful topics are captured. [Abstract] In this work, we incorporate language structure by combining a neural autoregressive topic model (TM) with a LSTM based language model (LSTM-LM) in a single probabilistic framework).

Chen teaches, in an analogous system: where k indicates the number of a source Sk, k >= 1, of the latent topic feature ([Page 2, Column 2, Paragraph 2] In summary, this paper makes the following contributions: 1. It proposes a novel approach to exploit text collections from many domains to learn prior knowledge to guide model inference in order to generate more coherent topics. Note: Text collections from many domains corresponds to more than one source).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Gupta to incorporate the teachings of Chen to use text collections from many domains. One would have been motivated to do this modification because doing so would give the benefit of representing lifelong learning method and would also help deal with big data as taught by Chen paragraph [Page 2, Column 2, Paragraph 2].
Larochelle teaches, in an analogous system: and minimizing, by the at least one central processing unit of the data processing system, the extended loss function £reg (v) to determine a minimal overall loss; based on the determined minimal overall loss of the extended loss function ([Page 7, Section 6.1, Paragraph 3] Instead of minimizing the average document negative log-likelihood, we also considered minimizing a version normalized by each document’s size).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Gupta and 

Regarding claim 2
Gupta teaches: The computer-implemented method according to claim 1, wherein the probabilistic or neural autoregressive topic model is a Document Neural Autoregressive Distribution Estimator (DocNADE) architecture ([Page 1, Abstract] In this work, we incorporate language structure by combining a neural autoregressive topic model (TM) with a LSTM based language model (LSTM-LM) in a single probabilistic framework. [Page 8, Section 3.5] To this end, we have combined a topic- (i.e., DocNADE) and a neural language (e.g.,LSTM) model in a single probabilistic framework).

Regarding claim 3
Gupta teaches: The computer-implemented method according to claim 1, using Multi-View Transfer, MVT, by additionally using Local-View Transfer, LVT, further comprising ([Page 2, Section: Contribution 1] This allows for the accurate prediction of words, where the probability of each word is a function of global and local contexts, modelled via DocNADE. [Page 5, Paragraph 3] expose W to both global and local influences by sharing W in the DocNADE. Note: Also Figure 2 (right) [Page 3] shows both Global and Local view thus corresponding to Multi-view transfer (MVT)): 
∈ RExK where E indicates the dimension of the word embedding ([Page 3, Paragraph 3] we use pre-trained word embeddings. [Page 6, Section: Experimental Setup] we perform a pre-training); 
transferring knowledge to the target T by LVT via learning meaningful word embeddings guided by relevant word embeddings Ek of the word embeddings KB, comprising the sub-step ([Page 2] Figure 1 (center) shows Local view. [Page 2, Section: Contribution 1] learning complementary semantics by combining joint word and latent topic learning. [Page 3, Section: Contribution 2] Taken together, we combine the advantages of complementary learning and external knowledge, and couple topic- and language models with pre-trained word embeddings to): 
extending a term for calculating pre-activations a of the probabilistic or neural autoregressive topic model of the target T, which pre-activations a control an activation of the autoregressive NN for the preceding words v<i in the probabilities p(vi I v<i) of each word vi, with weighted relevant latent word embeddings Ek to form an extended pre-activation aext ([Page 1, Abstract] combining a neural autoregressive topic model (TM) with a LSTM based language model (LSTM-LM) in a single probabilistic framework. [page 4, Paragraph 1] this leads to tractable gradients of the data negative log-likelihood. [Page 4, Section 2.1] DocNADE models the joint distribution p(v) of all words vi by decomposing it as p(v) = piDi=1 p(vijv<i), where each autoregressive conditional p(vi|v<i) for the word observation vi is computed using the preceding observations v<i ∈ {v1; ...; vi-1} in a feed-forward neural network for i 2 {1; ...D}. Equation (1). where, g( ) is an activation function, U ∈ 2 RKxH is a weight matrix connecting hidden to output, e ∈ RH and b 2 RK are bias vectors, W ∈ RHxK is a word representation matrix in which a column W:;vi is a vector representation of the word vi in the vocabulary, and H is the number of hidden units (topics). [Page 4] Algorithm 1. Note: Computing the activation in the previous step of the for loop corresponds to pre-activation).

Regarding claim 4
The system of Gupta, Chen, and Larochelle teaches: The computer-implemented method according to claim 1, wherein the latent topic features Zk E JRHxK of the topic KB and/or word embeddings Ek E RExK of the word embeddings KB (as shown above).
However, Gupta does not explicitly disclose: using Multi-Source Transfer, MST and stem from more than one source Sk, k > 1.
Chen teaches, in an analogous system: using Multi-Source Transfer, MST, and  stem from more than one source Sk, k > 1 (In summary, this paper makes the following contributions: 1. It proposes a novel approach to exploit text collections from many domains to learn prior knowledge to guide model inference in order to generate more coherent topics [Page 2, Column 2, Paragraph 2]. Note: Many domains corresponds to more than one source and text collections corresponds to word embeddings and topic KB).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the system of Gupta to incorporate the teachings of Chen to use text collections from many domains. One would have been motivated to do this modification because doing so would give the benefit of representing lifelong learning method and would also help deal with big data as taught by Chen paragraph [Page 2, Column 2, Paragraph 2].


Regarding claim 6
Gupta teaches: The computer-readable medium having stored thereon the computer program according to claim 5 ([Page 6] Experimental setup corresponds to computer-readable medium having stored thereon the computer program according to claim 5).

Regarding claim 7
Gupta teaches: A data processing system comprising means for carrying out the steps of the method according to claim 1 ([Page 6, Paragraph 4] Experimental Setup: DocNADE is often trained on a reduced vocabulary (RV) after pre-processing (e.g., ignoring functional words, etc.); however, we also investigate training it on full text/vocabulary).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Bei et al (2017) discloses Jointly Learning Word Embeddings and Latent Topics.
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
/CHAITANYA R JAYAKUMAR/ Examiner, Art Unit 2128                                                                                                                                                                                         
/OMAR F FERNANDEZ RIVAS/Supervisory Patent Examiner, Art Unit 2128