Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Amendments
Claims 1, 3-4, 6, 8-9 and 11 are amended. Claims 12-15 are new. Claims 1-15 are pending and have been considered.

Information Disclosure Statement
	Specification paragraph [0044], lines 2-4 refers to literature entitled “Keyan Zhou, Chengqing Zong. Method for handling unknown words in a Chinese-English statistical translation system.” This reference appears to be missing from the information disclosure statements. Applicant should submit this reference along with an information disclosure statement.

Specification
The specification amendment to paragraph [0058] was received on 03/30/2022 and has been entered.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6-10 rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 6, line 8 recites "the obtaining module".  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the limitation as “an obtaining module”.
Claim 6, line 13 recites "the first processing module".  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the limitation as “a first processing module”.
Claim 6, line 17 recites "the second processing module".  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the limitation as “a second processing module”.
Claim 6, line 20 recites "the third processing module".  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the limitation as “a third processing module”.
Claim 9, line 4 recites: “the third processing module".  There is insufficient antecedent basis for this limitation in the claim. Examiner interprets the limitation as “a third processing module”.
Claims 7-10 are rejected for failing to cure the deficiencies of claim 6 upon which they depend.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

Claims 1-15 are rejected under 35 U.S.C. 103 as being unpatentable over Ling et al. (“Character-based Neural Machine Translation”) in view of Le et al. (U.S. Patent 10,133,739). Both references were cited in the PTO-892 dated 01/05/2022.
	
Regarding CLAIM 1, Ling teaches: A neural network-based translation method, comprising:
obtaining an initial translation of a to-be-translated sentence, wherein the initial translation carries an unknown word; (Figure 1 on p. 3 shows a joint alignment and translation model, which is described on p.2, § 2.1. In Figure 1, the source sentence in box 1 is being translated into a target sentence as seen in box 3. The limitation “an initial translation of a to-be-translated sentence” is taught by the partially-completed sentence in box 3, “Donde esta la”. Note: In boxes 1 and 3 of Figure 1, each asterisk denotes a character-to-word (C2W) compositional model as seen in Figure 2 on p. 4. In box 6, the double asterisk denotes a vector-to-character (V2C) generation model as seen in Figure 3 on p. 5. The leftmost vectors in box 6 of Fig. 1 and in Fig. 3 are both color-coded yellow in the Ling publication available online.
Table 2 on p. 9 (explained on p. 8, bottom paragraph) shows English-to-Portuguese translations as generated by the joint alignment and translation model of Fig. 1, with the original English in row 1 and a Portuguese translation in row 3. The “initial translation” in Portuguese is output one word at a time, in the same manner as in box 3 of the joint alignment and translation model.)
splitting the unknown word in the initial translation into one or more characters, and (The splitting is taught by page 4, last paragraph, lines 1-3: “The illustration of the model is shown in [Figure] 2. Essentially, the model builds a representation of the word using characters, by reading characters from left to right and vice-versa. More formally, given an input word                  
                    
                        
                            s
                        
                        
                            j
                        
                    
                    =
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
            ”. The input word                 
                    
                        
                            s
                        
                        
                            j
                        
                    
                
             is split into its characters                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
             . This process is illustrated in Fig. 2 on 4, which is reproduced below with annotations. Fig. 2 depicts the C2W compositional model. The example input source word “Where” is split into its five characters “W”, “h”, “e”, “r”, “e”.

    PNG
    media_image1.png
    434
    894
    media_image1.png
    Greyscale

Ling, Fig. 2 with annotations
Regarding the limitation “unknown word,” Ling teaches that unknown or unseen source words are input into and processed by the joint alignment and translation model of Fig. 1, of which the C2W compositional model is the first step. Ling provides evidence for this in the following sections: P. 1, Abstract, lines 9-10: “our model….”; p. 8, bottom paragraph, lines 1-3; p.  9, Table 2 caption, lines 1-2.
Although Ling does not teach that “library” (in Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown, according to the caption.)
inputting, into a first multi-layer neural network, a character sequence constituted by the one or more characters that is obtained by splitting the unknown word; (Page 4, last paragraph, lines 3-4, starting with “the model”. Since the caption for Fig. 2 states “Square boxes represent vectors of neuron activations,” this indicates a neural network generates character projections. A first multi-layer neural network includes at least the input layer and output layer of each character projection model. See the annotations above.)
obtaining a character vector of each character in the character sequence by using the first multi-layer neural network, and (Character vectors are vectors                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
             disclosed in the last paragraph on page 4. The caption for Fig. 2 states “Square boxes represent vectors of neuron activations,” which indicates that the first multi-layer neural network outputs a character vector.)
inputting character vectors in the character sequence into a second multi-layer neural network; (A second multi-layer neural network is the bidirectional long short-term memory (BLSTM) in the C2W compositional model as seen in Fig. 2. Ling teaches inputting the character vectors in forward order into a forward LSTM and in reverse order into a backward LSTM, as disclosed by page 4, last paragraph, lines 4-5: “Then it builds a forward LSTM state sequence… by reading the character vectors                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
            . Another backward LSTM reads character vectors in the reverse order”. This is shown in Ling Fig. 2, where the straight vertical arrow exiting from a character vector enters the forward LSTM, and the bent arrow enters the backward LSTM. See the annotations above.)
encoding the character vectors by using the second multi-layer neural network and a preset common word database to obtain a semantic vector corresponding to the character sequence; and (Encoding the character vectors to obtain a semantic vector is illustrated at the bottom of the C2W compositional model of Fig. 2 showing the example word vector for “Where”. This is explained from p. 4, third-to-last line to p. 5, line 3. On p. 7, § 2.5, last paragraph teaches training the C2W model to generate word embeddings from the word lookup table. The limitation “a preset common word database” is taught by the English-Portuguese language pair from Europarl for training the model; see p. 7, § 3.1, lines 1-3.)
inputting the semantic vector into a third multi-layer neural network, (A third multi-layer neural network is interpreted collectively as the nodes in boxes 5 and 6 in the joint alignment and translation model seen in Fig. 1. Box 5 is explained in “5-Alignment via Attention” on p. 3, and Box 6 is explained in “6-Target Word Generation” on p. 3. Box 6 uses the V2C generation model seen in Fig. 3 on p. 5. The last paragraph on p. 3 explains that the most likely source word/words to generate the predicted word                 
                    
                        
                            w
                        
                        
                            p
                        
                    
                
             is contained in vector a. This vector is the aligned source word vector according to the paragraph below Fig. 3.)
decoding the semantic vector by using the third multi-layer neural network, and (A decoder is the V2C generation model as seen in Fig. 3 on p. 5. An aligned source word vector a is input into the third network. The paragraph beneath Fig. 3 states, “An illustration of the V2C (vector to characters) is shown in Figure 3” and the next paragraph, line 2-3 states “Each prediction is dependent on the input of the model (aligned source words a and and target word context                 
                    
                        
                            l
                        
                        
                            p
                            -
                            1
                        
                        
                            f
                        
                    
                
            ).” Note that the example vectors in Fig. 3 show a snapshot right before “esta” is added to the output in Fig. 1. 
Decoding is further taught by Ling from p. 5 (beneath Fig. 3) to page 6 before §2.3, and on p. 6, § 2.4.)
determining a final translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence, (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph of § 2.2; and § 2.4, ¶ 1, last line.)
wherein the final translation carries a translation of the unknown word. (Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. See the Table’s caption, lines 2-3. Table 2 is explained on p. 8, bottom paragraph, lines 1-3.)
	However, Ling does not explicitly teach: obtaining an initial translation wherein the initial translation carries an unknown word; splitting the unknown word in the initial translation
But Le teaches: obtaining an initial translation wherein the initial translation carries an unknown word; (C. 2, L. 61-67 to C. 3, L. 1-14 disclose OOV words in an initial translation. Le teaches using a dictionary (C. 5, L. 23-35) for mapping unknown/pointer tokens to a respective source word in the source sentence corresponding to the unknown token (C. 5, L. 17-19).)
 splitting the unknown word unknown word in the initial translation (C. 2, L. 61-67 to C. 3, L. 1-14 disclose OOV words in an initial translation.)
Both Ling and Le are in the same field of endeavor as the claimed invention, namely, neural machine translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated Ling’s initial translation containing Le’s pointer tokens and their associated dictionary entry. A motivation for the combination is that the pointer token identifies a respective source word in the source sentence corresponding to the unknown token. (Le, C. 3, L. 9-14)

	Regarding CLAIM 2, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the preset common word database comprises at least one of a dictionary, a linguistics rule, and a cyberword database. (Ling teaches a “cyberword database” on p. 7, § 3.1, lines 1-3. Europarl is a cyberword database.)
	Additionally, Le teaches a dictionary at C. 3, L. 32-34.

Regarding CLAIM 3, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the encoding the character vectors by using the second multi-layer neural network and the preset common word database to obtain the semantic vector corresponding to the character sequence comprises: (Encoding the character vectors to obtain a semantic vector is illustrated at the bottom of the C2W compositional model of Fig. 2 showing the example word vector for “Where”. This is explained from p. 4, third-to-last line to p. 5, line 3. On p. 7, the last paragraph of § 2.5 teaches training the C2W model to generate word embeddings from the word lookup table. The limitation “a preset common word database” is taught by the English-Portuguese language pair from Europarl for training the model; see p. 7, § 3.1, lines 1-3.)
determining at least one combination manner of the character vectors in the character sequence by using the second multi-layer neural network based on vocabulary information provided by the preset common word database, (P. 4, last line “Finally” to p. 5, line 3)
wherein a character vector combination determined by each combination manner corresponds to one meaning; and (The broadest reasonable interpretation of “one meaning” is the entry in the word lookup table for a character vector combination.  P. 7, last paragraph of § 2.5, lines 1-2 states that the C2W model is trained to produce the same word vectors as the word lookup table for all training word types.)
compression encoding at least one meaning of at least one character vector 26combination determined by the at least one combination manner to obtain the semantic vector. (On p. 5, top, the word                 
                    
                        
                            s
                        
                        
                            j
                        
                    
                
             is a combination of the character vectors in the character sequence.)
	Although Ling discloses a preset common word database Europarl, Ling does not explicitly teach that Europarl is used to generate the encoding as recited by claim 3. Ling does not explicitly teach: determining at least one combination manner… by using the second multi-layer neural network based on vocabulary information provided by the preset common word database,
	But Le teaches: determining at least one combination manner… by using the second multi-layer neural network based on vocabulary information provided by the preset common word database, (The broadest reasonable interpretation of “vocabulary information” includes the spellings and definitions of words. The conventional word dictionary 140 disclosed at C. 3, L. 32-34 inherently contains spellings and definitions of words.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Le’s word dictionary 140 into Ling’s model, with a motivation to map words in the source language to translations of the words into the target language. (Le, C. 3, L. 27-29)

Regarding CLAIM 4, the combination of Ling and Le teaches: The translation method according to claim 3, 
Ling teaches: wherein the decoding the semantic vector by using the third multi-layer neural network, (V2C model, starting on p. 5 beneath Fig. 3 through page 6, §2.2) and determining a final translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence comprises: (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph of § 2.2 and p. 6, § 2.4, ¶ 1, last sentence.) 
decoding the semantic vector by using the third multi-layer neural network to determine at least one meaning comprised in the semantic vector, and (The limitation “at least one meaning comprised in the semantic vector” is a hypothesis of the source word generated by a beam search. Ling teaches a word-based beam search on p. 6, § 2.4, ¶ 1 and a character-based beam search ¶ 2 lines 3-6. The two beam searches execute simultaneously, as indicated by the last sentence of ¶ 2.)
selecting, based on a context meaning of the unknown word in the initial translation, (The limitation “a context meaning of the unknown word” is the decoder LSTM’s current hidden state, represented in the V2C model of Fig. 3 by a current hidden state of the forward LSTM. For example, in Fig. 3, before outputting the character “s”, the context meaning is the vector directly above “s”.) 
a target meaning from the at least one meaning comprised in the semantic vector; and (The limitation “a target meaning” is the translation of the source word. Ling, p. 6, § 2.4, ¶ 1, lines 6-8 states: “We set a beam                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
            , which defines the number of hypothesis to be expanded prioritizing hypothesis with the highest sentence probability. An hypothesis is final once it generates the end of sentence token EOS.” The final hypothesis is the selection. In ¶ 2, the last sentence states, regarding the character-level beam search: “In this case, the beam search is run until                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
             final hypothesis are found (generation of EOW), as it must return at least                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
             new hypothesis to ensure that the word level search is complete.”)
determining the final translation of the to-be-translated sentence based on the target meaning and the context meaning of the unknown word in the initial translation. (Figs. 1 shows that the final translation of the example source sentence “Where is the library” is the target sentence “Donde esta la biblioteca”. The final translation is based on both the target meaning “biblioteca” and the context meaning from a hidden state of the LSTM in the V2C model (Fig. 3). Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teach in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”.)
	However, Ling does not explicitly teach: the unknown word in the initial translation
	But Le teaches: the unknown word in the initial translation (C. 3, L. 6-14)

Regarding CLAIM 5, the combination of Ling and Le teaches: The translation method according to claim 1, 
Ling teaches: wherein the unknown word comprises at least one of an abbreviation, a proper noun, a derivative, and a compound word. (The broadest reasonable interpretation of a “derivative” in light of the instant specification ¶ [0041], type (3), is an English suffix “-ation.” The unknown word in Ling’s Table 2, row 1 on p. 9 contains this suffix. Ling, p. 8, last paragraph, lines 5-6 state “Firstly, the English suffix -ation and the Portuguese suffix -dade are common endings for nouns.”)

	Regarding CLAIM 6, Ling teaches: A neural network-based translation apparatus, comprising:
obtain an initial translation of a to-be-translated sentence, wherein the initial translation carries an unknown word; (Figure 1 on p. 3 shows a joint alignment and translation model, which is described on p.2, § 2.1. In Figure 1, the source sentence in box 1 is being translated into a target sentence as seen in box 3. The limitation “an initial translation of a to-be-translated sentence” is taught by the partially-completed sentence in box 3, “Donde esta la”. Note: In boxes 1 and 3 of Figure 1, each asterisk denotes a character-to-word (C2W) compositional model as seen in Figure 2 on p. 4. In box 6, the double asterisk denotes a vector-to-character (V2C) generation model as seen in Figure 3 on p. 5. The leftmost vectors in box 6 of Fig. 1 and in Fig. 3 are both color-coded yellow in the Ling publication available online.
Table 2 on p. 9 (explained on p. 8, bottom paragraph) shows English-to-Portuguese translations as generated by the joint alignment and translation model of Fig. 1, with the original English in row 1 and a Portuguese translation in row 3. The “initial translation” in Portuguese is output one word at a time, in the same manner as in box 3 of the joint alignment and translation model.)
split the unknown word in the initial translation obtained by the obtaining module into one or more characters, and (The splitting is taught by page 4, last paragraph, lines 1-3: “The illustration of the model is shown in [Figure] 2. Essentially, the model builds a representation of the word using characters, by reading characters from left to right and vice-versa. More formally, given an input word                  
                    
                        
                            s
                        
                        
                            j
                        
                    
                    =
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
            ”. The input word                 
                    
                        
                            s
                        
                        
                            j
                        
                    
                
             is split into its characters                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
             . This process is illustrated in Fig. 2 on 4, which is reproduced below with annotations. Fig. 2 depicts the C2W compositional model. The example input source word “Where” is split into its five characters “W”, “h”, “e”, “r”, “e”. 
Regarding the limitation “unknown word,” Ling teaches that unknown or unseen source words are input into and processed by the joint alignment and translation model of Fig. 1, of which the C2W compositional model is the first step. Ling provides evidence for this in the following sections: P. 1, Abstract, lines 9-10: “our model….”; p. 8, bottom paragraph, lines 1-3; p.  9, Table 2 caption, lines 1-2.
Although Ling does not teach that “library” (in Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown, according to the caption.)
An “obtaining module” is interpreted as the system shown in Ling’s Fig. 1.)
input, into a first multi-layer neural network, a character sequence constituted by the one or more characters that is obtained by splitting the unknown word; (Page 4, last paragraph, lines 3-4, starting with “the model”. Since the caption for Fig. 2 states “Square boxes represent vectors of neuron activations,” this indicates a neural network generates character projections. A first multi-layer neural network includes at least the input layer and output layer of each character projection model. See the annotations above.)
obtain, by using the first multi-layer neural network, a character vector of each character in the character sequence input by the first processing module, and (Character vectors are vectors                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
             disclosed in the last paragraph on page 4. The caption for Fig. 2 states “Square boxes represent vectors of neuron activations,” which indicates that the first multi-layer neural network outputs a character vector.
A “first processing module” is interpreted as the C2W compositional model shown in Fig. 2)
input character vectors in the character sequence into a second multi-layer neural network; (A second multi-layer neural network is the bidirectional long short-term memory (BLSTM) in the C2W compositional model as seen in Fig. 2. Ling teaches inputting the character vectors in forward order into a forward LSTM and in reverse order into a backward LSTM, as disclosed by page 4, last paragraph, lines 4-5: “Then it builds a forward LSTM state sequence… by reading the character vectors                 
                    
                        
                            s
                        
                        
                            j
                            ,
                            0
                        
                    
                    ,
                     
                    …
                    ,
                     
                    
                        
                            s
                        
                        
                            j
                            ,
                            x
                        
                    
                
            . Another backward LSTM reads character vectors in the reverse order”. This is shown in Ling Fig. 2, where the straight vertical arrow exiting from a character vector enters the forward LSTM, and the bent arrow enters the backward LSTM. See the annotations above.)
encode, by using the second multi- 5layer neural network and a preset common word database, the character vectors input by the second processing module to obtain a semantic vector corresponding to the character sequence; and (Encoding the character vectors to obtain a semantic vector is illustrated at the bottom of the C2W compositional model of Fig. 2 showing the example word vector for “Where”. This is explained from p. 4, third-to-last line to p. 5, line 3. On p. 7, § 2.5, last paragraph teaches training the C2W model to generate word embeddings from the word lookup table. The limitation “a preset common word database” is taught by the English-Portuguese language pair from Europarl for training the model; see p. 7, § 3.1, lines 1-3.
A “second processing module” is interpreted as the C2W compositional model shown in Fig. 2)
input the semantic vector obtained by the third processing module into a third multi-layer neural network, (A third multi-layer neural network is interpreted collectively as the nodes in boxes 5 and 6 in the joint alignment and translation model seen in Fig. 1. Box 5 is explained in “5-Alignment via Attention” on p. 3, and Box 6 is explained in “6-Target Word Generation” on p. 3. Box 6 uses the V2C generation model seen in Fig. 3 on p. 5. The last paragraph on p. 3 explains that the most likely source word/words to generate the predicted word                 
                    
                        
                            w
                        
                        
                            p
                        
                    
                
             is contained in vector a. This vector is the aligned source word vector according to the paragraph below Fig. 3.
A “third processing module” is interpreted as the C2W compositional model shown in Fig. 2.)
decode the semantic vector by using the third multi-layer neural network, and (A decoder is the V2C generation model as seen in Fig. 3 on p. 5. An aligned source word vector a is input into the third network. The paragraph beneath Fig. 3 states, “An illustration of the V2C (vector to characters) is shown in Figure 3” and the next paragraph, line 2-3 states “Each prediction is dependent on the input of the model (aligned source words a and and target word context                 
                    
                        
                            l
                        
                        
                            p
                            -
                            1
                        
                        
                            f
                        
                    
                
            ).” Note that the example vectors in Fig. 3 show a snapshot right before “esta” is added to the output in Fig. 1. 
Decoding is further taught by Ling from p. 5 (beneath Fig. 3) to page 6 before §2.3, and on p. 6, § 2.4.)
determine a final translation of the to-be-translated sentence based on the initial translation of the to-be-translated sentence, (Under the broadest reasonable interpretation of “final translation,” Figure 1 shows a final translation is determined after “biblioteca” is appended to the initial translation in box 3. The final translation is determined once the model produces an end of sequence token as disclosed by p. 6, last paragraph of § 2.2; and § 2.4, ¶ 1, last line.)
wherein the final translation carries a translation of the unknown word. (Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teaches in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. See the Table’s caption, lines 2-3. Table 2 is explained on p. 8, bottom paragraph, lines 1-3.)
	However, Ling does not explicitly teach: one or more processors configured to invoke a program code stored in a memory 
	obtain an initial translation wherein the initial translation carries an unknown word;
split the unknown word in the initial translation
But Le teaches: one or more processors configured to invoke a program code stored in a memory (C. 6, L. 37-42 and C. 7, L. 28-33)
	obtain an initial translation wherein the initial translation carries an unknown word; (C. 2, L. 61-67 to C. 3, L. 1-14 disclose OOV words in an initial translation. Le teaches using a dictionary (C. 5, L. 23-35) for mapping unknown/pointer tokens to a respective source word in the source sentence corresponding to the unknown token (C. 5, L. 17-19).)
split the unknown word in the initial translation (C. 2, L. 61-67 to C. 3, L. 1-14 disclose OOV words in an initial translation.)
Both Ling and Le are in the same field of endeavor as the claimed invention, namely, neural machine translation. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have generated Ling’s initial translation containing Le’s pointer tokens and their associated dictionary entry, and to have incorporated Le’s processors and memory storing instructions. A motivation for the combination is that the pointer token identifies a respective source word in the source sentence corresponding to the unknown token. (Le, C. 3, L. 9-14)

	Regarding CLAIM 7, the combination of Ling and Le teaches: The translation apparatus according to claim 6, 
Ling teaches: wherein the preset common word database comprises at least one of a dictionary, a linguistics rule, and a cyberword database. (Ling teaches a “cyberword database” on p. 7, § 3.1, lines 1-3. Europarl is a cyberword database.)
	Additionally, Le teaches a dictionary at C. 3, L. 32-34.

Regarding CLAIM 8, the combination of Ling and Le teaches: The translation apparatus according to claim 6, wherein the one or more processors is configured to: 
determine at least one combination manner of the character vectors in the character sequence by using the second multi-layer neural network based on vocabulary information provided by the preset common word database, (P. 4, last line “Finally” to p. 5, line 3)
wherein a character vector combination determined by each combination manner corresponds to one meaning; and (The broadest reasonable interpretation of “one meaning” is the entry in the word lookup table for a character vector combination.  P. 7, last paragraph of § 2.5, lines 1-2 states that the C2W model is trained to produce the same word vectors as the word lookup table for all training word types.)
compression encode at least one meaning of at least one character vector combination determined by the at least one combination manner, to obtain the semantic vector. (On p. 5, top, the word                 
                    
                        
                            s
                        
                        
                            j
                        
                    
                
             is a combination of the character vectors in the character sequence.)
	Although Ling discloses a preset common word database Europarl, Ling does not explicitly teach that Europarl is used to generate the encoding as recited by claim 8. Ling does not explicitly teach: determining at least one combination manner… by using the second multi-layer neural network based on vocabulary information provided by the preset common word database,
	But Le teaches: determining at least one combination manner… by using the second multi-layer neural network based on vocabulary information provided by the preset common word database, (The broadest reasonable interpretation of “vocabulary information” includes the spellings and definitions of words. The conventional word dictionary 140 disclosed at C. 3, L. 32-34 inherently contains spellings and definitions of words.)
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated Le’s word dictionary 140 into Ling’s model, with a motivation to map words in the source language to translations of the words into the target language. (Le, C. 3, L. 27-29)

Regarding CLAIM 9, the combination of Ling and Le teaches: The translation apparatus according to claim 8, wherein the one or more processors is configured to: 
Ling teaches: decode, by using the third multi-layer neural network, the semantic vector obtained by the third processing module to determine at least one meaning comprised in the semantic vector, and (The limitation “at least one meaning comprised in the semantic vector” is a hypothesis of the source word generated by a beam search. Ling teaches a word-based beam search on p. 6, § 2.4, ¶ 1 and a character-based beam search ¶ 2 lines 3-6. The two beam searches execute simultaneously, as indicated by the last sentence of ¶ 2.
A “third processing module” is interpreted as C2W compositional model in Fig. 2.)
select, based on a context meaning of the unknown word in the initial translation, (The limitation “a context meaning of the unknown word” is the decoder LSTM’s current hidden state, represented in the V2C model of Fig. 3 by a current hidden state of the forward LSTM. For example, in Fig. 3, before outputting the character “s”, the context meaning is the vector directly above “s”.)
a target meaning from the at least one meaning comprised in the semantic vector; and (The limitation “a target meaning” is the translation of the source word. Ling, p. 6, § 2.4, ¶ 1, lines 6-8 states: “We set a beam                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
            , which defines the number of hypothesis to be expanded prioritizing hypothesis with the highest sentence probability. An hypothesis is final once it generates the end of sentence token EOS.” The final hypothesis is the selection. In ¶ 2, the last sentence states, regarding the character-level beam search: “In this case, the beam search is run until                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
             final hypothesis are found (generation of EOW), as it must return at least                 
                    
                        
                            k
                        
                        
                            w
                        
                    
                
             new hypothesis to ensure that the word level search is complete.”)
determine the final translation of the to-be-translated sentence based on the target meaning and the context meaning of the unknown word in the initial translation. (Figs. 1 shows that the final translation of the example source sentence “Where is the library” is the target sentence “Donde esta la biblioteca”. The final translation is based on both the target meaning “biblioteca” and the context meaning from a hidden state of the LSTM in the V2C model (Fig. 3). Although Ling does not teach that “library” (Fig. 1) is an unknown word, Ling teach in Table 2 (p. 9) a translation from English (row 1) to Portuguese (row 3) in which the last words of the sentences in both languages are unknown. The caption, lines 2-3 state: “The unknown word in the translation as well as their aligned words are marked in bold”.)
However, Ling does not explicitly teach: the unknown word in the initial translation
	But Le teaches: the unknown word in the initial translation (C. 3, L. 6-14)

Regarding CLAIM 10, the combination of Ling and Le teaches: The translation apparatus according to claim 6, 
Ling teaches: wherein the unknown word comprises at least one of an abbreviation, a proper noun, a derivative, and a compound word. (The broadest reasonable interpretation of a “derivative” in light of the instant specification ¶ [0041], type (3), is an English suffix “-ation.” The unknown word in Ling’s Table 2, row 1 on p. 9 contains this suffix. Ling, p. 8, last paragraph, lines 5-6 state “Firstly, the English suffix -ation and the Portuguese suffix -dade are common endings for nouns.”)

Claims 11-15 are product claims which recite the same features as method claims 1-5, respectively. Claim 11 additionally recites a non-transitory computer-readable storage medium storing instructions, which when executed by one or more processors, cause the one or more processors to perform operation comprising the method of claim 1. A storage medium is taught by Le, C. 6, L. 37-42 and C. 7, L. 28-33. Claims 11-15 are rejected for the reasons set forth in the rejections of claims 1-5, respectively.

Response to Arguments
	Examiner herein responds to the Applicant’s remarks, claim amendments, and specification amendments received on 03/30/2022.

Information Disclose Statement: Applicant should submit an argument under the heading “Remarks” pointing out disagreements with the examiner’s contentions. 

Objections to the Specification: The previous objections to the specification have been withdrawn due to the replacement paragraph [0058]. 

Objections to the Claims: The previous objections to claims 1, 3, 4, 6, 9, and 11 are withdrawn due to the claim amendments.

Claim Interpretations and Claim Rejections Under 35 U.S.C. 112: Claims 6, 8, and 9 are no longer being interpreted under 35 U.S.C. 112(f) due to the claim amendments. The previous rejections of claims 6-10 under 35 U.S.C. 112(b) are withdrawn. 

Claim Rejections Under 35 U.S.C. 103:  Applicant's arguments with respect to the rejection of claim 1 have been fully considered but they are not persuasive. Applicant makes specific arguments on p. 12 of the Remarks. The Claim 1 limitation “encoding the character vectors by using… and a preset common word database to obtain a semantic vector” is broad and does not specifically disclose what is a common word database and how the common word database is being used, other than a broad recitation that it is being used. The cited reference Ling discloses that the AI model is trained using a word translation database consisting of English-Portuguese sentence pairs from Europarl (p. 7, section 3.1, first paragraph) which is interpreted as the common word database, and this is being used in word translation. Experimental results are shown in section 3.2 (Word Representation and Word Generation), Table 1 and Table 2 which disclose the trained model with the database being used to encode words that are similar to other words or unknown words based on the word pairs in the database and rules identified from the word pairs. Also, the secondary reference Le discloses using a dictionary for word translation at C. 3, L. 32-34. The rejection of Claim 1 is maintained.

Conclusion
 THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648. The examiner can normally be reached Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Abdullah Al Kawsar can be reached on (571)270-3169. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ASHER H. JABLON/Examiner, Art Unit 2127                                                                                                                                                                                                        

/ABDULLAH AL KAWSAR/Supervisory Patent Examiner, Art Unit 2127