Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The specification is objected to as failing to provide proper antecedent basis for the claimed subject matter.  See 37 CFR 1.75(d)(1) and MPEP § 608.01(o).  Correction of the following is required: .
In claims 1 and similarly for 13, the encoder/encoding step is claimed to “either” “pushes a non-terminal token”, “pushes a terminal token”, “or”, “pops tokens off”. This could be interpreted as performing either the first two “pushes” or the “pops”.  Further, specification discloses the RNNG uses all three actions for encoding, unlike what appears to be claimed regarding having to either perform one (or two pushes) or the pops. Clarification is required to either provide proper support for the claims or to amend the claims in compliant with the applicant’s disclosure.







Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Eriguchi et al. (Learning to Parse and Translate Improves Neural Machine Translation 12 February 2017) in view of Chris Dyer et al. (Recurrent Neural Network Grammars 12 October 2016).
Claim 1
Eriguchi teaches a machine translation system comprising: 
an encoder (page 2, column 1, lines 1-9, “attention-based encoder-decoder network with two recurrent networks-encoder and decoder- and an attention model”) that encodes tokens of a source sequence in a first language over a plurality of encoding time steps (See Col. 2 under “Recurrent Neural Network Grammars”, implemented as a stack with time step), 
wherein each of the tokens of the source sequence corresponds to a character from the source sequence (Eriguchi suggests working in characters/subword units,  in neural machine translation have been known in the art- See Introduction: “Some of the most recent studies have for instance demonstrated that NMT systems work comparably to other systems even when the source and target sentences are given simply as flat sequences of characters (Lee et al., 2016; Chung et al., 2016) or statistically, not linguistically, motivated subword units (Sennrich et al., 2016; Wu et al., 2016). Shi et al. (2016) recently made an observation that the encoder of NMT captures syntactic properties of a source sentence automatically, indirectly suggesting that explicit linguistic prior may not be necessary.”), 

pushes a non-terminal token to an encoder stack (Eriguchi while referring to the secondary reference Dyer, discloses “Additionally, the action may be one of many possible non-terminal symbols, in which case the predicted non-terminal symbol is pushed to the stack.”), the non-terminal token corresponding to a phrase type (Applicant’s disclosure at least found in [0051] provides the following, “At each encoding timestep, the hidden state of the RNNG encoder 112 (encoder state) is fed to an encoder softmax layer which determines whether the next action is a RED, GEN or NT.” The “phrase type” (such as “VP”), for example, can be associated with the non-terminal (NT) operations taught by Eriguchi in page 2, column 2, 18-20); 
pushes a terminal token to the encoder stack (See Col. 2 under “Recurrent Neural Network Grammars”, word at the beginning of the buffer is moved to the stack. Examiner notes the claim is constructed in a contingent limitation format, and hence, an art teaching one of the limitation of the “either” clause is sufficient-in this case the “pushes” steps.); or 
pops tokens off the encoder stack to a last pushed non-terminal token (See Col. 2 under “Recurrent Neural Network Grammars”, When the reduce action is selected, the top-two words in the stack are reduced to build a partial tree.),
 Eriguchi may not clearly detail generates a new token using a composition function applied to the popped tokens, and pushes the new token onto the encoder stack; and an attention-based decoder that generates non-terminal tokens based on attention to the tokens on the encoder stack, and outputs tokens of a target sequence in a second language based on the attention; wherein the encoder lacks separate non-terminal tokens for different phrase types across encoding time steps, such that values corresponding to the non-terminal tokens are fixed according to a constant vector.

an attention-based decoder that generates non-terminal tokens based on attention to the tokens on the encoder stack, and outputs tokens of a target sequence in a second language based on the attention (See equation 3 for attention-based neural machine translation decoder; cj is a time-dependent context vector that is computed by the attention model using the sequence h of hidden states from the encoder; under section 4.1 NMT+RG, Construction: Second, we let the next word prediction of the translation decoder as a generator of RNNG. In other words, the generator of RNNG will output a word); 
wherein the encoder lacks separate non-terminal tokens for different phrase types across encoding time steps, such that values corresponding to the non-terminal tokens are fixed according to a constant vector (Table 5: Parsing results with fixed composition function.  Examiner notes more descriptive differentiating limitation would be helpful to overcome itself from prior art Dyer. See also footnote under section 3.1 Parser transitions regarding preterminal 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to incorporate the encoder as taught by Dyer with the translation machine of Eriguchi, because doing so would have provided better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English  (abstract of Dyer).
Claim 2
Eriguchi in view of Dyer further teaches the system of claim 1, wherein each of the tokens of the source sequence corresponds to a character from the source sequence (Eriguchi suggests working in characters/subword units,  in neural machine translation have been known in the art- See Introduction: “Some of the most recent studies have for instance demonstrated that NMT systems work comparably to other systems even when the source and target sentences are given simply as flat sequences of characters (Lee et al., 2016; Chung et al., 2016) or statistically, not linguistically, motivated subword units (Sennrich et al., 2016; Wu et al., 2016). Shi et al. (2016) recently made an observation that the encoder of NMT captures syntactic properties of a source sentence automatically, indirectly suggesting that explicit linguistic prior may not be necessary.”).



Eriguchi in view of Dyer further teaches the system of claim 6, wherein the character-based tokens are densely encoded using real-valued vectors (Page 2, Column 1, lines 16-20 of Eriguchi, h with Vx(xi) referring to the word vector of the i-th source word).
Claim 4
Eriguchi in view of Dyer further suggests the system of claim 6, wherein the character-based tokens are sparsely encoded using one- hot vectors (Section 5.2 of Eriguchi, regarding dimensions of the vectors and its attempt to reduce computational overhead; Examiner notes no specifics regarding the one-hot vectors is found in the Applicant’s disclosure. Further, it would have been obvious to replace one type of vector with another in the encoder to provide a better data representation).
Claim 5
Eriguchi in view of Dyer further teaches the system of claim 6, wherein a phrase tree structure of both the source and target sequences includes one or more character-based token constituents (page 2, column 2, lines 36-40 of Eriguchi, When the RNNG is used as a generator (See GEN action of Applicant’s disclosure), the buffer further generates the next word when the selected action is shift.) and a phrase type constituent (page 2, column 2, lines 14-20 of Eriguchi, the word at the beginning of the buffer is moved to the stack. When the reduce action is selected, the top-two words in the stack are reduced to build a partial tree. Additionally, the action may be one of many possible non-terminal symbols, in which case the predicted non-terminal symbol is pushed to the stack. Examiner notes non-terminal symbols provides information regarding phrase type NT(NP), NT(VP), etc.).


Eriguchi in view of Dyer further teaches the system of claim 5, wherein a constant vector is used as a common embedding for different phrase types (Table 5 of Dyer: Parsing results with fixed composition function.  See also footnote under section 3.1 Parser transitions regarding preterminal symbols: Preterminal symbols are, from the parsing algorithm's point of view, just another kind of nonterminal symbol that re- quires no special handling. However, leaving them out reduces the number of transitions by O(n) and also reduces the number of action types, both of which reduce the runtime. Furthermore, standard parsing evaluation scores do not depend on preterminal prediction accuracy… our parser allows unary nonterminal productions).
Claim 7
Eriguchi in view of Dyer further teaches the system of claim 9, wherein an encoder compositional embedding encodes one or more character-based token constituents, without encoding a phrase type constituent (page 2, column 2, lines 20-23 of Eriguchi, Additionally, the action may be one of many possible non-terminal symbols, in which case the predicted non-terminal symbol is pushed to the stack. This indicates the phrase type constituent associated with “NT” is optional.).
Claim 8
Eriguchi in view of Dyer further suggests the system of claim 1, further configured to: use an ultimate encoder compositional embedding of the source sequence as a decoder embedding for an initially predicted phrase type of the target sequence (page 7, lines 6-16 of Eriguchi, The RNNG’s stack computes the vector of a dependency parse tree which consists of the generated target words by the buffer. Since the complete parse tree has a “ROOT” node, the special token of the end of a sentence (“EOS”) is considered as the ROOT. Each weight is 
Claim 9
Eriguchi in view of Dyer further teaches the system of claim 1, further configured to: use policy gradient reinforcement learning to induce unsupervised phrase tree structures of both the source and target sequences (Page 4, Col. 1, Section 5.2 of Eriguchi,  Models, Learning and Inference, We use stochastic gradient descent with minibatches of 128 examples.).
Claim 10
Eriguchi in view of Dyer further teaches the system of claim 1, wherein the encoder and the attention-based decoder are long short- term memory (abbreviated LSTM) networks (Page 4, Col. 1, Section 5.2 of Eriguchi,  In all our experiments, each recurrent network has a single layer of LSTM units of 256 dimensions,).
Claim 11
Eriguchi in view of Dyer further teaches the system of claim 14, wherein the encoder and the attention-based decoder each include a bi-directional LSTM (abbreviated Bi-LSTM) that calculates encoder and decoder compositional embeddings (Page 2, lines 5-14 of Eriguchi, The encoder, which is often implemented as a bidirectional recurrent network with long short-term memory units (LSTM, Hochreiter and Schmidhuber, 1997) or gated recurrent units (GRU, Cho et al., 2014), first reads a source sentence represented as a sequence of words x =(x1; x2; : : : ; xN). The encoder returns a sequence of hidden states h = (h1; h2; : : : ; hN). Each hidden state hi is a concatenation of those from the forward and backward recurrent network).

Eriguchi in view of Dyer further teaches the system of claim 1, wherein the encoder and the attention-based decoder are stack-only recurrent neural network grammar (abbreviated s-RNNG) networks (Page 2, Col. 2, lines 1-10 of Eriguchi, Unlike a usual recurrent language model (see, e.g., Mikolov et al., 2010), an RNNG simultaneously models both tokens and their tree-based composition. This is done by having a (output) buffer, stack and action history, each of which is implemented as a stack LSTM (sLSTM, Dyer et al., 2015). At each time step, the action sLSTM predicts the next action based on the (current) hidden states of the buffer, stack and action sLSTM.).
Claims 13-21
These claims include substantially the same limitations as those provided in claims 1-9 above, and therefore they are rejected for the same reasons.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-21 are rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-27 of U.S. Patent No. 10565318 in view of Chris Dyer et al. (Recurrent Neural Network Grammars 12 October 2016). Although the claims at issue are not identical, they are not patentably distinct from each other because the instant set of claims are an obvious variation of the patented claims. 
Claim 1 is in some form broader than the patented claim 1, by excluding limitations regarding tree structures, corresponding compositional embedding, and the decoder usage of attention weights to scale the encoder’s compositional embeddings.  Patented claim does not include limitations, including regarding specific actions of the time steps, but as provided in the rejection section above in claim 1, Dyer teaches the limitations. Motivation to combine would have been to provide better parsing in English than any single previously published supervised generative model and better language modeling than state-of-the-art sequential RNNs in English  (abstract of Dyer).
Dependent claim 2 limitation is part of the patented claim 1.  Claims 3-5 are the same as patented claims 7-9. Dependent claim 6 limitation is part of the patented claim 1. Claim 7-12 are the same as patented claims 11-16. Other set of claims 13-21 are rejected for similar reasons.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to THOMAS H MAUNG whose telephone number is (571)270-5690. The examiner can normally be reached Monday-Friday, 9am-6pm, EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivian Chin can be reached on 1-(571) 272-7848. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.






/THOMAS H MAUNG/Primary Examiner, Art Unit 2654