DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.

Information Disclosure Statement
	Specification paragraph [0044], lines 2-4 refers to literature entitled “Keyan Zhou, Chengqing Zong. Method for handling unknown words in a Chinese-English statistical translation system.” This reference appears to be missing from the information disclosure statements. Applicant should submit this reference along with an information disclosure statement.

Specification
The disclosure is objected to because of the following informalities: In para. [0058], line 18, the word “mat” should read “beat” as seen below by the annotations to line 18 in para. [0058]. Lines 11-15 discuss the segment “cat sat on the mat” and lines 15-20 discuss the segment “cat sat on the beat.” Since the specification appears to contrast these two segments, it appears a typo was made.
Appropriate correction is required.

Claim Objections
Claims 1, 3, 4, 6, 9, and 11 are objected to because of the following informalities. 
Remove the commas in
claim 1, 2nd-to-last paragraph, line 2 before “to obtain”
claim 3, ¶ 1, line 3 before “to obtain” and ¶ 3 line 2 before “to obtain”
claim 4, ¶ 2, line 1 before “to determine”
claim 6, 2nd-to-last paragraph, line 3 before “to obtain”
claim 9, ¶ 2, line 2 before “to determine”
claim 11, line 3 before “to perform” 
	Additionally in claim 3, the third paragraph should recite “compression encoding”.  Appropriate correction is required.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.

“obtaining module” in claim 6
“first processing module” in claim 6
“second processing module” in claim 6 
“third processing module” in claim 6 and in claim 8
“fourth processing module” in claim 6  and in claim 9 

Because these claim limitations are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, they are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have these limitations interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-10 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6 limitations “obtaining module,” “first processing module,” “second processing module,” “third processing module,” and “fourth processing module” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function. These modules are depicted in Fig. 5 and described in specification ¶ [0071] without sufficient structure. Although specification ¶ [0080] discloses processor 61, it is not part of the embodiment described in ¶ [0071]. Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).

(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.

Claims 7-10 are rejected for failing to cure the deficiencies of claim 6 upon which they depend.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-11 are rejected under 35 U.S.C. 103 as being unpatentable over Ling et al. (“Character-based Neural Machine Translation”) in view of Le et al. (U.S. Patent 10,133,739).

Regarding CLAIM 1, Ling teaches: A neural network-based translation method, comprising:
obtaining an initial translation of a to-be-translated sentence, (Figure 1 on p. 3 shows a joint alignment and translation model, which is described in § 2.1 on p. 2. In Figure 1, the source sentence in box 1 is being translated into a target sentence as seen in box 3. The limitation “an initial translation of a to-be-translated sentence” is taught by the partially-completed sentence in box 3, “Donde esta la”. Note: In boxes 1 and 3 of Figure 1, each asterisk denotes a character-to-word (C2W) compositional 
Table 2 on p. 9 (explained on p. 8, bottom paragraph) shows English-to-Portuguese translations as generated by the joint alignment and translation model of Fig. 1, with the original English in row 1 and a Portuguese translation in row 3. The “initial translation” in Portuguese is output one word at a time, in the same manner as in box 3 of the joint alignment and translation model.)
splitting the unknown word… into one or more characters, and (The splitting is taught by page 4, last paragraph, lines 1-3: “The illustration of the model is shown in [Figure] 2. Essentially, the model builds a representation of the word using characters, by reading characters from left to right and vice-versa. More formally, given an input word                          
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                            =
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                    ”. The input word                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                     is split into its characters                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     . This process is illustrated in Ling Fig. 2 from page 4, which is reproduced below with annotations. Fig. 2 depicts the C2W compositional model. The example input source word “Where” is split into its five characters “W”, “h”, “e”, “r”, “e”.

    PNG
    media_image1.png
    434
    894
    media_image1.png
    Greyscale

Ling, Fig. 2 with annotations
subsidisation to subsidade.” 
Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”.)
inputting, into a first multi-layer neural network, a character sequence constituted by the one or more characters that is obtained by splitting the unknown word; (Page 4, last paragraph, lines 3-4 states: “the model projects each character into a continuous                         
                            
                                
                                    d
                                
                                
                                    s
                                    ,
                                    c
                                
                            
                        
                    -dimensional vectors                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     using a character lookup table.” Since the caption for Fig. 2 states that “Square boxes represent vectors of neuron activations,” this indicates a neural network generates character projections. A first multi-layer neural network includes at least the input layer and output layer of each character projection model.)
obtaining a character vector of each character in the character sequence by using the first multi-layer neural network, and (Character vectors are vectors                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     disclosed in the last paragraph on page 4. The caption for Fig. 2 states that “Square boxes represent vectors of neuron activations,” which indicates that the first multi-layer neural network outputs a character vector.)
inputting all character vectors in the character sequence into a second multi-layer neural network; (A second multi-layer neural network is the bidirectional long short-term memory (BLSTM) in                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                    . Another backward LSTM reads character vectors in the reverse order”. This is shown in Ling Fig. 2, where the straight vertical arrow exiting from a character vector enters the forward LSTM, and the bent arrow enters the backward LSTM.)
encoding all the character vectors by using the second multi-layer neural network and a preset common word database, to obtain a semantic vector corresponding to the character sequence; and (Encoding all the character vectors to obtain a semantic vector is illustrated at the bottom of the C2W compositional model of Fig. 2 showing the example word vector for “Where”.  Explained by Ling from page 4, last paragraph, line 4 to page 5, line 3:

    PNG
    media_image2.png
    93
    954
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    136
    832
    media_image3.png
    Greyscale

On p. 7, the last paragraph of § 2.5 Ling teaches training the C2W model to generate word embeddings from the word lookup table. 
The limitation “a preset common word database” is taught by the English-Portuguese language pair from Europarl. On p. 7, § 3.1, lines 1-3 state: “We test our model in two datasets. First, we 600k sentence pairs for training from Europarl (Koehn, 2005), in the English-Portuguese language pair. Then, we define another 500 sentence pairs for development and 500 sentence pairs for testing.”)
inputting the semantic vector into a third multi-layer neural network, (A third multi-layer neural network is interpreted collectively as the nodes in boxes 5 and 6 in the joint alignment and translation model seen in Fig. 1. Box 5 is an attention model (see p. 3). Box 6 is the target word generation (see p. 3, bottom to p. 4), which uses the V2C generation model seen in Fig. 3 on p. 5. The last paragraph on p. 3 explains that the most likely source word/words to generate the predicted word                         
                            
                                
                                    w
                                
                                
                                    p
                                
                            
                        
                     is contained in vector a (the aligned source word vector according to the paragraph below Fig. 3).)
decoding the semantic vector by using the third multi-layer neural network, and (A decoder is the V2C generation model as seen in Ling, Fig. 3 on p. 5. An aligned source word vector a is input into the third network. The paragraph beneath Fig. 3 states, “An illustration of the V2C (vector to characters) is shown in Figure 3” and the next paragraph, line 2-3 states “Each prediction is dependent on the input of the model (aligned source words a and and target word context                         
                            
                                
                                    l
                                
                                
                                    p
                                    -
                                    1
                                
                                
                                    f
                                
                            
                        
                    ).” Note that the example vectors in Fig. 3 show a snapshot right before “esta” is added to the output in Fig. 1. 
Decoding is further taught by Ling from p. 5 (beneath Fig. 3) to page 6 before §2.3, and on p. 6, § 2.4.)
determining a final translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence, (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph above § 2.3: “Finally, the model is also required to produce the end of sentence token EOS, similarly to a word softmax. In our model, we simply consider the EOS token as a word whose only character is EOS. In another words, it must generate the sequence EOS,EOS.” Ling also states on p. 6, § 2.4, line 8: “An hypothesis is final once it generates the end of sentence token EOS.”)
wherein the final translation carries a translation of the unknown word. (Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”. Table 2 is explained on p. 8, bottom paragraph, lines 1-3.)
While Ling generally teaches “a preset common word database” by the English-Portuguese language pair from Europarl, Le specifically teaches using a dictionary (C. 5, L. 23-35) for mapping unknown/pointer tokens to a respective source word in the source sentence corresponding to the unknown token (C. 5, L. 17-19).
	However, Ling does not explicitly teach: wherein the initial translation carries an unknown word; unknown word in the initial translation
	But Le teaches: wherein the initial translation carries an unknown word; unknown word in the initial translation (Both of these limitations are taught by Le from C. 2, L. 61-67 to C. 3, L. 1-5: “Additionally, in some cases, the neural network translation model 120 may determine that certain target words in target sentences are not words from the target language vocabulary. That is, for a given position in the target language sentence, the neural network translation model 120 may determine that the word at that position should not be any of the words in the target language vocabulary and should instead be an unknown word. Source words that are not in the source language vocabulary and target words that are not in the target language vocabulary will be referred to in this specification as out-of-vocabulary (OOV) words or unknown words.” 
Le further teaches at C. 3, L. 6-14: “In order to account for OOV words that appear in target language sentences, the translation system 100 trains the neural network translation model 120 to track the origin in source sentences of unknown words in target sentences. In particular, the translation system 100 trains the neural network translation model 120 to be operable to emit (i) pointer tokens, 
Both Ling and Le are in the same field of endeavor as the claimed invention, namely, neural machine translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated Ling’s initial translation containing Le’s pointer tokens and their associated dictionary entry. A motivation for the combination is that the pointer token identifies a respective source word in the source sentence corresponding to the unknown token. (Le, C. 3, L. 9-14: “In particular, the translation system 100 trains the neural network translation model 120 to be operable to emit (i) pointer tokens, pointer tokens being unknown tokens that identify a respective source word in the source sentence corresponding to the unknown token”)

	Regarding CLAIM 2, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the preset common word database comprises at least one of a dictionary, a linguistics rule, and a cyberword database. (Ling teaches a “cyberword databse” on p. 7, § 3.1, lines 1-3: “We test our model in two datasets. First, we 600k sentence pairs for training from Europarl (Koehn, 2005), in the English-Portuguese language pair. Then, we define another 500 sentence pairs for development and 500 sentence pairs for testing.” Europarl is a cyberword database. Examiner is not required to cite prior art for the limitations “a dictionary” and “a linguistics rule” because they are listed as alternatives to “a cyberword database.”)
Additionally, Le teaches a dictionary at C. 3, L. 27-34.

CLAIM 3, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the encoding all the character vectors by using the second multi-layer neural network… , to obtain the semantic vector corresponding to the character sequence comprises: (Encoding all the character vectors to obtain a semantic vector is illustrated at the bottom of the C2W compositional model of Fig. 2 showing the example word vector for “Where”.  Explained by Ling from page 4, last paragraph, line 4 to page 5, line 3:

    PNG
    media_image2.png
    93
    954
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    136
    832
    media_image3.png
    Greyscale

On p. 7, the last paragraph of § 2.5 Ling teaches training the C2W model to generate word embeddings from the word lookup table.)
Ling teaches: determining at least one combination manner of the character vectors in the character sequence by using the second multi-layer neural network (Ling, at the bottom of page 4 and the top of page 5, states: “Finally, the

    PNG
    media_image4.png
    120
    715
    media_image4.png
    Greyscale
”)
wherein a character vector combination determined by each combination manner corresponds to one meaning; and (The broadest reasonable interpretation of “one meaning” is the entry in the word lookup table for a character vector combination.  Le states that the C2W model is 
compression ecoding at least one meaning of at least one character vector 26combination determined by the at least one combination manner, to obtain the semantic vector. (On p. 5, top, the word                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                     is a combination of the character vectors in the character sequence.)
	Although Ling discloses a preset common word database Europarl, Ling does not explicitly teach that Europarl is used to generate the encoding as recited by claim 3. Ling does not explicitly teach: … and the preset common word database
determining combination(s) based on vocabulary information provided by the common word database, 
	But Le teaches: … and the preset common word database(Le teaches a common word database being a dictionary at C. 3, L. 27-29: “The word dictionary 140 is a dictionary that maps words in the source language to translations of the words into the target language.” C. 3, L. 32-34 state: “In some other implementations, the system uses a conventional word dictionary as the word dictionary 140.”)
determining combination(s) based on vocabulary information provided by the common word database, (The broadest reasonable interpretation of “vocabulary information” includes the spellings and definitions of words. A conventional word dictionary inherently contains spellings and definitions of words. Le, C. 3, L. 32-34 states: “In some other implementations, the system uses a conventional word dictionary as the word dictionary 140.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Le’s word dictionary 140 into Ling’s model, with a motivation to map words in the source language to translations of the words into the target language. (Le, C. 3, L. 27-29)

Regarding CLAIM 4, the combination of Ling and Le teches: The translation method according to claim 3, 
Ling teaches: wherein the decoding the semantic vector by using the third multi-layer neural network, (from p. 5 (beneath Fig. 3) to page 6 before §2.3) and determining a final translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph above § 2.3: “Finally, the model is also required to produce the end of sentence token EOS, similarly to a word softmax. In our model, we simply consider the EOS token as a word whose only character is EOS. In another words, it must generate the sequence EOS,EOS.” Ling also states on p. 6, § 2.4, line 8: “An hypothesis is final once it generates the end of sentence token EOS.”) comprises: 
decoding the semantic vector by using the third multi-layer neural network, to determine at least one meaning comprised in the semantic vector, and (The limitation “at least one meaning comprised in the semantic vector” is a hypothesis of the source word generated by a beam search. Ling teaches a word-based beam search on p. 6, § 2.4, ¶ 1 and a character-based beam search ¶ 2 lines 3-6. The two beam searches execute simultaneously, as indicated by the last sentence of ¶ 2.)
selecting, based on a context meaning of the unknown word… , (The limitation “a context meaning of the unknown word” is the decoder LSTM’s current hidden state, represented in the V2C model of Fig. 3 by a current hidden state of the forward LSTM. For example, in Fig. 3, before outputting the character “s”, the context meaning is the vector directly above “s”.) 
selecting a target meaning from the at least one meaning comprised in the semantic vector; and (The limitation “a target meaning” is the translation of the source word . Ling, p. 6, § 2.4, ¶ 1, lines                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                    , which defines the number of hypothesis to be expanded prioritizing hypothesis with the highest sentence probability. An hypothesis is final once it generates the end of sentence token EOS.” The final hypothesis is the selection. In ¶ 2, the last sentence states, regarding the character-level beam search: “In this case, the beam search is run until                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                     final hypothesis are found (generation of EOW), as it must return at least                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                     new hypothesis to ensure that the word level search is complete.”)
determining the final translation of the to-be-translated sentence based on the target meaning and the context meaning of the unknown word… (Figs. 1 shows that the final translation of the example source sentence “Where is the library” is the target sentence “Donde esta la biblioteca”. The final translation is based on both the target meaning “biblioteca” and the context meaning from a hidden state of the LSTM in the V2C model (Fig. 3). Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teach in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”.)
	However, Ling does not explicitly teach: unknown word in the initial translation
	But Le teaches unknown word in the initial translation (C. 3, L. 6-14)

Regarding CLAIM 5, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the unknown word comprises at least one of an abbreviation, a proper noun, a derivative, and a compound word. (The broadest reasonable interpretation of a “derivative” in light of the instant specification ¶ [0041], type (3), is an English suffix “-ation.” The unknown word in Table 2, row 1 contains this suffix. Ling, p. 8, last paragraph, lines 5-6 state “Firstly, the English suffix 


	Regarding CLAIM 6, Ling teaches: A neural network-based translation apparatus, comprising: (The experimental section, § 3 on pp. 7-9, is evidence of a computer device generating a neural network-based translation.)
an obtaining module, configured to obtain an initial translation of a to-be-translated sentence, (Figure 1 on p. 3 shows a joint alignment and translation model, which is described in § 2.1 on p. 2. In Figure 1, the source sentence in box 1 is being translated into a target sentence as seen in box 3. The limitation “an initial translation of a to-be-translated sentence” is taught by the partially-completed sentence in box 3, “Donde esta la”. Note: In boxes 1 and 3 of Figure 1, each asterisk denotes a character-to-word (C2W) compositional model as seen in Figure 2. In box 6, the double asterisk denotes a vector-to-character (V2C) generation model as seen in Figure 3. The leftmost vectors in box 6 of Fig. 1 and in Fig. 3 are both color-coded yellow in the Ling publication.
Table 2 on p. 9 (explained on p. 8, bottom paragraph) shows English-to-Portuguese translations as generated by the joint alignment and translation model of Fig. 1, with the original English in row 1 and a Portuguese translation in row 3. The “initial translation” in Portuguese is output one word at a time, in the same manner as in box 3 of the joint alignment and translation model.)
a first processing module, configured to: split the unknown word… obtained by the obtaining module into one or more characters, and (The splitting is taught by page 4, last paragraph, lines 1-3: “The illustration of the model is shown in [Figure] 2. Essentially, the model builds a representation of the word using characters, by reading characters from left to right and vice-versa. More formally, given an input word                          
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                            =
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                    ”. The input word                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                     is split into its characters                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     . This process is 

    PNG
    media_image1.png
    434
    894
    media_image1.png
    Greyscale

Ling, Fig. 2 with annotations
Regarding the limitation “unknown word,” Ling teaches that unknown or unseen source words are input into and processed by the joint alignment and translation model of Fig. 1, of which the C2W compositional model is the first step. Ling provides evidence for this in the following sections: On page 9, Table 2 caption, lines 1-2: “The unknown word in the translation as well as their aligned words are marked in bold”. On p. 1, Abstract, lines 9-10: “our model is capable of interpreting and generating unseen word forms.” On p. 8, bottom paragraph, lines 1-3: “A strong aspect in the V2C model is that the model can generate unseen words. In Table 2, we provide three examples of unknown words that have been generated in Portuguese. The first is the translation of unknown word subsidisation to subsidade.” 
Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in 
input, into a first multi-layer neural network, a character sequence constituted by the one or more characters that is obtained by splitting the unknown word; (Page 4, last paragraph, lines 3-4 states: “the model projects each character into a continuous                         
                            
                                
                                    d
                                
                                
                                    s
                                    ,
                                    c
                                
                            
                        
                    -dimensional vectors                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     using a character lookup table.” Since the caption for Fig. 2 states that “Square boxes represent vectors of neuron activations,” this indicates a neural network generates character projections. A first multi-layer neural network includes at least the input layer and output layer of each character projection model.)
a second processing module, configured to: obtain, by using the first multi-layer neural network, a character vector of each character in the character sequence input by the first processing module, and (Character vectors are vectors                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                     disclosed in the last paragraph on page 4. The caption for Fig. 2 states that “Square boxes represent vectors of neuron activations,” which indicates that the first multi-layer neural network outputs a character vector.)
input all character vectors in the character sequence into a second multi-layer neural network; (A second multi-layer neural network is the bidirectional long short-term memory (BLSTM) in the C2W compositional model as seen in Ling Fig. 2. Ling teaches inputting the character vectors in forward order into a forward LSTM and in reverse order into a backward LSTM, as disclosed by page 4, last paragraph, lines 4-5: “Then it builds a forward LSTM state sequence… by reading the character vectors                         
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    0
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    s
                                
                                
                                    j
                                    ,
                                    x
                                
                            
                        
                    . Another backward LSTM reads character vectors in the reverse order”. This is shown in Ling Fig. 2, where the straight vertical arrow exiting from a character vector enters the forward LSTM, and the bent arrow enters the backward LSTM.)
a third processing module, configured to: encode, by using the second multi-layer neural network and a preset common word database, all the character vectors input by the second processing module, to obtain a semantic vector corresponding to the character sequence; and 

    PNG
    media_image2.png
    93
    954
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    136
    832
    media_image3.png
    Greyscale

On p. 7, the last paragraph of § 2.5 Ling teaches training the C2W model to generate word embeddings from the word lookup table. 
The limitation “a preset common word database” is taught by the English-Portuguese language pair from Europarl. On p. 7, § 3.1, lines 1-3 state: “We test our model in two datasets. First, we 600k sentence pairs for training from Europarl (Koehn, 2005), in the English-Portuguese language pair. Then, we define another 500 sentence pairs for development and 500 sentence pairs for testing.”)
a fourth processing module, configured to: input the semantic vector obtained by the third processing module into a third multi-layer neural network, (A third multi-layer neural network is interpreted collectively as the nodes in boxes 5 and 6 in the joint alignment and translation model seen in Fig. 1. Box 5 is an attention model (see p. 3). Box 6 is the target word generation (see p. 3, bottom to p. 4), which uses the V2C generation model seen in Fig. 3 on p. 5. The last paragraph on p. 3 explains that the most likely source word/words to generate the predicted word                         
                            
                                
                                    w
                                
                                
                                    p
                                
                            
                        
                     is contained in vector a (the aligned source word vector according to the paragraph below Fig. 3).)
decode the semantic vector by using the third multi-layer neural network, and (A decoder is the V2C generation model as seen in Ling, Fig. 3 on p. 5. An aligned source word vector a is input into a and and target word context                         
                            
                                
                                    l
                                
                                
                                    p
                                    -
                                    1
                                
                                
                                    f
                                
                            
                        
                    ).” Note that the example vectors in Fig. 3 show a snapshot right before “esta” is added to the output in Fig. 1. 
Decoding is further taught by Ling from p. 5 (beneath Fig. 3) to page 6 before §2.3, and on p. 6, § 2.4.)
determine a final 27translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence, (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph above § 2.3: “Finally, the model is also required to produce the end of sentence token EOS, similarly to a word softmax. In our model, we simply consider the EOS token as a word whose only character is EOS. In another words, it must generate the sequence EOS,EOS.” Ling also states on p. 6, § 2.4, line 8: “An hypothesis is final once it generates the end of sentence token EOS.”)
wherein the final translation carries a translation of the unknown word. (Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”. Table 2 is explained on p. 8, bottom paragraph, lines 1-3.)
While Ling generally teaches “a preset common word database” by the English-Portuguese language pair from Europarl, Le specifically teaches using a dictionary (C. 5, L. 23-35) for mapping unknown/pointer tokens to a respective source word in the source sentence corresponding to the unknown token (C. 5, L. 17-19).
wherein the initial translation carries an unknown word; unknown word in the initial translation
But Le teaches: wherein the initial translation carries an unknown word; unknown word in the initial translation (Both of these limitations are taught by Le from C. 2, L. 61-67 to C. 3, L. 1-5: “Additionally, in some cases, the neural network translation model 120 may determine that certain target words in target sentences are not words from the target language vocabulary. That is, for a given position in the target language sentence, the neural network translation model 120 may determine that the word at that position should not be any of the words in the target language vocabulary and should instead be an unknown word. Source words that are not in the source language vocabulary and target words that are not in the target language vocabulary will be referred to in this specification as out-of-vocabulary (OOV) words or unknown words.” 
Le further teaches at C. 3, L. 6-14: “In order to account for OOV words that appear in target language sentences, the translation system 100 trains the neural network translation model 120 to track the origin in source sentences of unknown words in target sentences. In particular, the translation system 100 trains the neural network translation model 120 to be operable to emit (i) pointer tokens, pointer tokens being unknown tokens that identify a respective source word in the source sentence corresponding to the unknown token”. Le teaches, at C. 5, L. 32-35, using “a conventional word dictionary that includes translations from the source language to the target language in the post-processing step.”)
Both Ling and Le are in the same field of endeavor as the claimed invention, namely, neural machine translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated Ling’s initial translation containing Le’s pointer tokens and their associated dictionary entry. A motivation for the combination is that the pointer token identifies a respective source word in the source sentence corresponding to the unknown  (Le, C. 3, L. 9-14: “In particular, the translation system 100 trains the neural network translation model 120 to be operable to emit (i) pointer tokens, pointer tokens being unknown tokens that identify a respective source word in the source sentence corresponding to the unknown token”)

	Regarding CLAIM 7, the combination of Ling and Le teaches: The translation apparatus according to claim 6, 
Ling teaches: wherein the preset common word database comprises at least one of a dictionary, a linguistics rule, and a cyberword database. (Ling teaches a “cyberword databse” on p. 7, § 3.1, lines 1-3: “We test our model in two datasets. First, we 600k sentence pairs for training from Europarl (Koehn, 2005), in the English-Portuguese language pair. Then, we define another 500 sentence pairs for development and 500 sentence pairs for testing.” Europarl is a cyberword database. Examiner is not required to cite prior art for the limitations “a dictionary” and “a linguistics rule” because they are listed as alternatives to “a cyberword database.”)
Additionally, Le teaches a dictionary at C. 3, L. 27-34.

Regarding CLAIM 8, the combination of Ling and Le teaches: The translation apparatus according to claim 6, 
Ling teaches: wherein the third processing module is configured to: determine at least one combination manner of the character vectors in the character sequence by using the second multi-layer neural network… (Ling, at the bottom of page 4 and the top of page 5, states: “Finally, the

    PNG
    media_image4.png
    120
    715
    media_image4.png
    Greyscale
”)
wherein a character vector combination determined by each combination manner corresponds to one meaning; and (The broadest reasonable interpretation of “one meaning” is the entry in the word lookup table for a character vector combination.  Le states that the C2W model is trained to produce the same word vectors as the word lookup table for all training word types. P. 7, last paragraph of § 2.5, lines 1-2:   “The C2W model is introduced afterwards by first training the C2W model to produce the same word vectors as the word lookup tables for all training word types.”)
compression encode at least one meaning of at least one character vector combination determined by the at least one combination manner, to obtain the semantic vector. (On p. 5, top, the word                         
                            
                                
                                    s
                                
                                
                                    j
                                
                            
                        
                     is a combination of the character vectors in the character sequence.)
However, Ling does not explicitly teach: determining combination(s) based on vocabulary information provided by the common word database, 
	But Le teaches: determining combination(s) based on vocabulary information provided by the common word database, (Le teaches a common word database being a dictionary at C. 3, L. 27-29: “The word dictionary 140 is a dictionary that maps words in the source language to translations of the words into the target language.” C. 3, L. 32-34 state: “In some other implementations, the system uses a conventional word dictionary as the word dictionary 140.”
The broadest reasonable interpretation of “vocabulary information” includes the spellings and definitions of words. A conventional word dictionary inherently contains spellings and definitions of words. Le, C. 3, L. 32-34 states: “In some other implementations, the system uses a conventional word dictionary as the word dictionary 140.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Le’s word dictionary 140 into Ling’s model, with a motivation to map words in the source language to translations of the words into the target language. (Le, C. 3, L. 27-29)

	Regarding CLAIM 9, the combination of Ling and Le teaches: The translation apparatus according to claim 8, 
Ling teaches: wherein the fourth processing module is configured to: decode, by using the third multi-layer neural network, the semantic vector obtained by the third processing module, (decode is taught from p. 5 (beneath Fig. 3) to page 6 before §2.3) to determine at least one meaning comprised in the semantic vector, and (The limitation “at least one meaning comprised in the semantic vector” is a hypothesis of the source word generated by a beam search. Ling teaches a word-based beam search on p. 6, § 2.4, ¶ 1 and a character-based beam search ¶ 2 lines 3-6. The two beam searches execute simultaneously, as indicated by the last sentence of ¶ 2.)
select, based on a context meaning of the unknown word… , (The limitation “a context meaning of the unknown word” is the decoder LSTM’s current hidden state, represented in the V2C model of Fig. 3 by a current hidden state of the forward LSTM. For example, in Fig. 3, before outputting the character “s”, the context meaning is the vector directly above “s”.)
select a target meaning from the at least one meaning comprised in the semantic vector; (The limitation “a target meaning” is the translation of the source word . Ling, p. 6, § 2.4, ¶ 1, lines 6-8 states: “We set a beam                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                    , which defines the number of hypothesis to be expanded prioritizing hypothesis with the highest sentence probability. An hypothesis is final once it generates the end of sentence token EOS.” The final hypothesis is the selection. In ¶ 2, the last sentence states, regarding the character-level beam search: “In this case, the beam search is run until                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                     final hypothesis are found (generation of EOW), as it must return at least                         
                            
                                
                                    k
                                
                                
                                    w
                                
                            
                        
                     new hypothesis to ensure that the word level search is complete.”)
and determine the final translation of the to-be-translated sentence based on the target meaning and the context meaning of the unknown word… (Figs. 1 shows that the final translation of the example source sentence “Where is the library” is the target sentence “Donde esta la biblioteca”. 
However, Ling does not explicitly teach: unknown word in the initial translation
	But Le teaches unknown word in the initial translation (C. 3, L. 6-14)

	Regarding CLAIM 10, the combination of Ling and Le teaches: The translation apparatus according to claim 6, 
Ling teaches: wherein the unknown word comprises at least one of an abbreviation, a proper noun, a derivative, and a compound word. (The broadest reasonable interpretation of a “derivative” in light of the instant specification ¶ [0041], type (3), is an English suffix “-ation.” The unknown word in Table 2, row 1 contains this suffix. Ling, p. 8, last paragraph, lines 5-6 state “Firstly, the English suffix 
-ation and the Portuguese suffix -dade are common endings for nouns.” Examiner is not required to cite prior art for “abbreviation,” “proper noun,” and “compound word” because they are listed as alternatives for “derivative.”)

	Regarding CLAIM 11, the combination of Ling and Le teaches: to perform the method according to claim 1
	Ling teaches: A neural network-based translation apparatus (The experimental section, § 3 on pp. 7-9, is evidence of a computer device generating a neural network-based translation.)
comprising: a memory and a processor, wherein the memory is configured to store program code and the processor is configured to invoke the program code stored in the memory, 
But Le teaches: comprising: a memory and a processor, wherein the memory is configured to store program code and the processor is configured to invoke the program code stored in the memory, (Le C. 7, L. 28-33 states: “Generally, a central processing unit will receive instructions and data from a read only memory or a random access memory or both. The essential elements of a computer are a central processing unit for performing or executing instructions and one or more memory devices for storing instructions and data.”)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Le’s central processing unit, memory, instructions, and data, as disclosed at C. 7, L. 28-33, into Ling’s computer with a motivation to execute the method of claim 1 on Le’s computer device.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
“Neural Machine Translation of Rare Words with Subword Units” to Senrich et al. teaches subword translation (§ 3 on p. 2) and compression encoding subword units using byte pair encoding (§ 3.2 on p. 3).
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ASHER H. JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127