DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 10/06/2021 has been entered.
 
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 4-8, and 11-15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 4-6, 8, 11-13, and 15 rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without 
Step 2B (part 2 of the Mayo test) requires analyzing the claims to determine if they recite additional elements that amount to significantly more than the judicial exception. In this case, the claims do not include additional elements that are sufficient to amount to significantly more than the abstract idea itself.  

Regarding claims 1 and 8, generating an output sequence of words is a mental process, which is an abstract idea. Additional limitations of receiving an input sequence of words is insignificant extrasolution activity, while the remaining limitations are clarifications of the above abstract idea, without significantly more and without integration into a practical application.

Regarding claims 4 and 11, the limitations are further clarifications of the above abstract ideas.

Regarding claims 5-6 and 12-13, the limitations are further clarifications of the above abstract ideas, and incorporate mathematical calculations, which are also abstract.

The limitations of the claims, taken alone, do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements individually. Applicable case law cited in the Federal Register includes, but is not limited to: Alice Corp., 134 S. Ct. at 2355-56, Digitech Image Tech., LLC v. Electronics for Imaging, Inc., 758 F.3d 1344 (Fed. Cir. 2014), Benson, 409 U.S. at 63.

See "Preliminary Examination Instructions in view of the Supreme Court Decision in Alice Corporation Pty. Ltd. v. CLS Bank International, et al.," dated June 25, 2014, and the Federal Register notice titled "2014 Interim Guidance on Patent Subject Matter Eligibility" (79 FR 74618).

	
	Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 8, and 11-14 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gupta et al. (Gupta, P., Chaudhary, Y., Buettner, F., & Schütze, H. (2018). Texttovec: Deep contextualized neural arXiv preprint arXiv:1810.03947.), hereinafter referred to as Gupta.

Regarding claim 8, Gupta teaches:
A computer-implemented method for processing natural language, by receiving an input sequence ci of input words (vi, v2, ... vN) representing a first sequence of words in a natural language of a first text and generating an output sequence of output words                         
                            (
                            
                                
                                    v
                                    1
                                
                                ^
                            
                        
                    ,                         
                            
                                
                                    v
                                    2
                                
                                ^
                            
                        
                    , …                         
                            
                                
                                    v
                                    N
                                
                                ^
                            
                        
                    ) representing a second sequence of words in a natural language of a second text and modeled by a multinominal topic model (page 3, Fig. 2, where the input sequence of input words is processed by the multinominal topic models to produce an output sequence of output words), comprising the steps:
extending the multinominal topic model by an incorporation of language structures (wherein the multinominal topic model is extended by an incorporation of language structures using a deep contextualized Long-Short-Term Memory model), and
using a deep contextualized Long-Short-Term Memory model (wherein the multinominal topic model is extended by an incorporation of language structures using a deep contextualized Long-Short-Term Memory model)
wherein the multinominal topic model is a document neural autoregressive topic model, DocNADE (page 2, second to last full paragraph, where DocNADE is used), and the extended multinominal topic model is a contextualized document neural autoregressive topic model, ctx-DocNADE, incorporating context information for the first sequence of words (page 2, second to last full paragraph, where ctx-DocNADE is used), and
wherein the ctx-DocNADE model is extended by incorporation of distributed compositional priors for generating a ctx-DocNADEe model, incorporating external knowledge for each word of the 

Regarding claim 11, Gupta teaches:
The method of claim 10, wherein the distributed composition priors are pre-trained word embeddings by LSTM-LM (Page 3 third full paragraph, where pre-trained word embeddings via LSTM-LM are used).

Regarding claim 12, Gupta teaches:
The method of claim 8, wherein a conditional probability of the word vi in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors: hiDN(v<j) and hiLM (ci), stemming from the DocNADE-based and LSTM-based components of ctx-DocNADE, respectively: 
hi(v<i) = hiDN(v<i) + λ hiLM(ci) (pages 4-5, paragraph spanning pages, as well as equation 2)
where hiDN(v<j) is computed as:
hiDN(v<j) = g(e+                        
                            
                                
                                    ∑
                                    
                                        k
                                        <
                                        i
                                    
                                
                                
                                    
                                        
                                            W
                                        
                                        
                                            :
                                            ,
                                            v
                                            k
                                        
                                    
                                
                            
                        
                    ) (page 4, equation 1)
16/367,4442and λ is the mixture weight of the LM component, which can be optimized during training and based on the validation set and the second term hiLM is a context-dependent representation and output of an LSTM layer at position i-1 over input sequence ci, trained to predict the next word vi (page 5 first paragraph, where the variables are defined).

Regarding claim 13, Gupta teaches:
The method of claim 8, wherein the conditional distribution for each word vi is estimated by: 
                        
                            p
                            
                                
                                    
                                        
                                            v
                                        
                                        
                                            i
                                        
                                    
                                    =
                                    w
                                
                                
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    
                                        
                                            b
                                        
                                        
                                            w
                                        
                                    
                                    +
                                    
                                        
                                            U
                                        
                                        
                                            w
                                        
                                    
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                    )
                                    )
                                
                                
                                    
                                        
                                            ∑
                                            
                                                w
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    b
                                                
                                                
                                                    w
                                                
                                            
                                            +
                                            
                                                
                                                    U
                                                
                                                
                                                    w
                                                
                                            
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                            (
                                            
                                                
                                                    v
                                                
                                                
                                                    <
                                                    i
                                                
                                            
                                            )
                                            )
                                        
                                    
                                
                            
                        
                       (page 5, equation 2, where the same equation is used).

Regarding claim 14, Gupta teaches:
The method of claim 10 wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, 
logp(v) ≈                        
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        D
                                    
                                
                                
                                    l
                                    o
                                    g
                                    p
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            i
                                        
                                    
                                    |
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                    )
                                
                            
                        
                     (page 5 first paragraph, where the same equation is used).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-7, and 15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Gupta, in view of Li et al. (US 2020/0184339 A1), hereinafter referred to as Li.

Regarding claim 1, Gupta teaches:
receiving an input sequence ci of input words (vi, v2, ... vN) representing a first sequence of words in a natural language of a first text and generating an output sequence of output words                         
                            (
                            
                                
                                    v
                                    1
                                
                                ^
                            
                        
                    ,                         
                            
                                
                                    v
                                    2
                                
                                ^
                            
                        
                    , …                         
                            
                                
                                    v
                                    N
                                
                                ^
                            
                        
                    ) representing a second sequence of words in a natural language of a second text and modeled by a multinominal topic model (page 3, Fig. 2, where the input sequence of input words is processed by the multinominal topic models to produce an output sequence of output words)
wherein the multinominal topic model is extended by an incorporation of language structures using a deep contextualized Long-Short-Term Memory model (wherein the multinominal topic model is extended by an incorporation of language structures using a deep contextualized Long-Short-Term Memory model),

wherein the ctx-DocNADE model is extended by incorporation of distributed compositional priors for generating a ctx-DocNADEe model, incorporating external knowledge for each word of the first sequence of words (page 3 second and third full paragraphs, where ctx-DocNADEe is used by incorporating distributed compositional priors).
Gupta does not teach:
A natural language processing system comprising a processor, the processor configured for:
Li teaches:
A natural language processing system comprising a processor (Fig. 9 element 901, para [0093], where a CPU is used), the processor configured for:
The prior art, as embodied in the teachings of Gupta and Li, included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  One of ordinary skill in the art could have combined the elements as claimed by known methods and in that combination each element merely performs the same function as it does separately.  One of ordinary skill in the art would have recognized that the results of the combination were predictable.

Regarding claim 4, Gupta in view of Li teaches:


Regarding claim 5, Gupta in view of Li teaches:
The natural language processing system of claim 1, wherein a conditional probability of the word vi in ctx-DocNADE or ctx-DocNADEe is a function of two hidden vectors: hiDN(v<j) and hiLM (ci), stemming from the DocNADE-based and LSTM-based components of ctx-DocNADE, respectively: 
hi(v<i) = hiDN(v<i) + λ hiLM(ci) (Gupta pages 4-5, paragraph spanning pages, as well as equation 2)
where hiDN(v<j) is computed as:
hiDN(v<j) = g(e+                        
                            
                                
                                    ∑
                                    
                                        k
                                        <
                                        i
                                    
                                
                                
                                    
                                        
                                            W
                                        
                                        
                                            :
                                            ,
                                            v
                                            k
                                        
                                    
                                
                            
                        
                    ) (Gupta page 4, equation 1)
16/367,4442and λ is the mixture weight of the LM component, which can be optimized during training and based on the validation set and the second term hiLM is a context-dependent representation and output of an LSTM layer at position i-1 over input sequence ci, trained to predict the next word vi (Gupta page 5 first paragraph, where the variables are defined).

Regarding claim 6, Gupta in view of Li teaches:
The natural language processing system of claim 1, wherein the conditional distribution for each word vi is estimated by: 
                        
                            p
                            
                                
                                    
                                        
                                            v
                                        
                                        
                                            i
                                        
                                    
                                    =
                                    w
                                
                                
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                
                            
                            =
                             
                            
                                
                                    e
                                    x
                                    p
                                    ⁡
                                    (
                                    
                                        
                                            b
                                        
                                        
                                            w
                                        
                                    
                                    +
                                    
                                        
                                            U
                                        
                                        
                                            w
                                        
                                    
                                    
                                        
                                            h
                                        
                                        
                                            i
                                        
                                    
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                    )
                                    )
                                
                                
                                    
                                        
                                            ∑
                                            
                                                w
                                            
                                        
                                        
                                            e
                                            x
                                            p
                                            ⁡
                                            (
                                            
                                                
                                                    b
                                                
                                                
                                                    w
                                                
                                            
                                            +
                                            
                                                
                                                    U
                                                
                                                
                                                    w
                                                
                                            
                                            
                                                
                                                    h
                                                
                                                
                                                    i
                                                
                                            
                                            (
                                            
                                                
                                                    v
                                                
                                                
                                                    <
                                                    i
                                                
                                            
                                            )
                                            )
                                        
                                    
                                
                            
                        
                       (Gupta page 5, equation 2, where the same equation is used).

Regarding claim 7, Gupta in view of Li teaches:
, wherein the ctx-DocNADE model and the ctx-DocNADEe model are optimized to maximize the pseudo log likelihood, 
logp(v) ≈                        
                            
                                
                                    ∑
                                    
                                        i
                                        =
                                        1
                                    
                                    
                                        D
                                    
                                
                                
                                    l
                                    o
                                    g
                                    p
                                    (
                                    
                                        
                                            v
                                        
                                        
                                            i
                                        
                                    
                                    |
                                    
                                        
                                            v
                                        
                                        
                                            <
                                            i
                                        
                                    
                                    )
                                
                            
                        
                     (Gupta page 5 first paragraph, where the same equation is used).

Regarding claim 15, Gupta teaches:
perform the method according to claim 8.
Gupta does not teach:
A non-transitory computer-readable data storage medium comprising executable program code
Li teaches:
A non-transitory computer-readable data storage medium comprising executable program code (Fig. 9 element 908, para [0094], where a storage medium stores instructions)
The prior art, as embodied in the teachings of Gupta and Li, included each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference.  One of ordinary skill in the art could have combined the elements as claimed by known methods and in that combination each element merely performs the same function as it does separately.  One of ordinary skill in the art would have recognized that the results of the combination were predictable.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Holmer et al. (Holmer, E., & Marfurt, A. (2018). Explaining away syntactic structure in semantic document representations. arXiv preprint arXiv:1806.01620.) section 2 second column teaches different DocNADE variations.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRYAN S BLANKENAGEL whose telephone number is (571)270-0685. The examiner can normally be reached 8:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Richemond Dorvil can be reached on 571-272-7602. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/BRYAN S BLANKENAGEL/Primary Examiner, Art Unit 2658