DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/04/2020.  The submission is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is: “an inputter configured to receive a source language, which is a low-resource language, and a third language abundant in resources compared to the low-resource language;” in claim 1.
Because this/these claim limitation(s) is being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it is being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this limitation interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation to avoid it being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation recite sufficient structure to perform the claimed function so as to avoid it being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1-13 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. 
The independent claim 1 recites: 
an inputter configured to receive a source language, which is a low-resource language, and a third language abundant in resources compared to the low-resource language;
a memory configured to store a program which performs automatic translation between the source language, which is the low-resource language, and a target language using the third language; and
a processor configured to execute the program, wherein the processor performs the automatic translation using a third language vocabulary embedding vector. 

The claim as drafted relates to a human organizing of ideas. More specifically, this reads on a human:
receiving language (i.e., written text or utterance) from two different humans (i.e., in two different languages; one being the translation of the other for example and one of the languages being a known language by the receiving human and the other language being an unknown or barely known language by the receiving human); 
writing down a set of rules or criteria (i.e., program) to perform a translation of one of the received languages (i.e., written texts or utterances) to a target language;
translating said language into desired language based on the rules mentioned.


This judicial exception is not integrated into a practical application because for example: claim recites “automatic”, “an inputter”, “a memory”, and “a processor”. As an example, in [0087] of the as filed specification, “Processors suitable for execution of a computer program include, by way of example, both general and special purpose microprocessors, and any one or more processors of any kind of digital computer.” Therefore, a general-purpose computer or computing device is described and mainly used as an application thereof. Accordingly, these additional elements do not integrate the abstract idea into a practical idea because it does not impose any meaningful limits on practicing the abstract idea. 
The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to the integration of the abstract idea into a practical application, the additional elements of using a computer is listed as a general computing device as noted. The claim is not patent eligible. 
With respect to claim 2, the claim recites:
wherein the processor expresses each token of a source language token sequence and a third language token sequence input to the inputter into an embedding vector through an embedding layer and models a meaning and a structure of a sentence.

The claims relate to a human organizing of ideas. 
This reads on a human expressing each word of the written texts or utterances received from other humans and organizing said words following a predetermined set of rules or criteria into a list of a fixed pre-defined length [i.e., embedding layer], while determining the meaning/context and structure of the sentences received. 
The additional limitation of “the processor” has been described above. Please see claim 1, above.

With respect to claim 3, the claim recites:
wherein the processor calculates a weight vector by measuring a distance between an input embedding vector and the third language vocabulary embedding vector and 
generates a third language weight embedding vector using the weight vector and an embedding matrix of third language vocabularies.

The claims relate to a human organizing of ideas. 
This reads on a human comparing and determining by pen and paper how similar two received sentences are [i.e., from written text or utterance] and writing down a new list of the words [i.e., third language embedding vector] in one of the received languages using the calculated similarity [i.e., weight vector] between sentences described before and a table of words (e.g., dictionary) [i.e., embedding matrix]. 
The additional limitation of “the processor” has been described above. Please see claim 1, above.

With respect to claim 4, the claim recites:
wherein the processor generates a final embedding vector using the input embedding vector and the third language weight embedding vector.

The claims relate to a human organizing of ideas. 
This reads on a human writing down a new list of the words [i.e., final embedding vector] in one of the received languages using the received language sentence(s) and the list of words described in claim 3 [i.e., third language weight embedding vector]. 
The additional limitation of “the processor” has been described above. Please see claim 1, above.
With respect to claim 5, the claim recites:
wherein the processor allows parameters of a lower layer of a source language encoder to be tied to parameters of a lower layer of a third language encoder, and parameters of a lower layer of a target language decoder to be tied to parameters of a lower layer of a third language decoder.

The claims relate to an operation. 
This reads on sharing data from a portion of an encoder/decoder from one language with a portion of another encoder/decoder of another language, wherein these relationships between the encoders/decoders can be predetermined by a human. The additional limitation of “the processor” has been described above. Please see claim 1, above.

The independent claim 6 recites: 
(a) receiving a token sequence of a source language and a token sequence of a third language;
(b) expressing each token of the token sequences into an embedding vector; and
(c) generating a target language token sequence using the embedding vector and outputting the generated target language token sequence.

The claim as drafted relates to a human organizing of ideas. More specifically, this reads on a human:
receiving language (i.e., written text or utterance; comprised of a sequence of words (i.e., tokens)) from two different humans (i.e., in two different languages); 
writing down the individual words of each of the received languages (i.e., written texts or utterances) into a list [i.e., embedding vector];
writing down a list of words corresponding to a target language associated with the received language (i.e., translation). No additional limitations are present. 	


With respect to claim 7, the claim recites:
wherein the step (a) includes receiving the token sequence of the source language, which is a low-resource language, and the token sequence of the third language abundant in resources compared to the low-resource language.

The claims relate to a human organizing of ideas. 
This reads on a human receiving language (i.e., written text or utterance) from two different humans (i.e., in two different languages; one being the translation of the other for example and one of the languages being a known language by the receiving human and the other language being an unknown or barely known language by the receiving human) wherein the language is received in the form of sentences (i.e., sequence of words/tokens). No additional limitations are present. 	

With respect to claim 8, the claim recites:
wherein the step (b) includes allowing parameters of lower layers of encoder and decoder networks for modeling the source language to be tied to parameters of lower layers of encoder and decoder networks for modeling the third language.

The claims relate to an operation. 
This reads on sharing data from a portion of an encoder/decoder from one language with a portion of another encoder/decoder of another language, wherein these relationships between the encoders/decoders can be predetermined by a human. No additional limitations are present. 	

With respect to claim 9, the claim recites:
calculating a weight vector for a similarity between an input vocabulary and a third language vocabulary;
generating a third language weight embedding vector using the weight vector and an embedding matrix of the third language; and
generating a final embedding vector using an embedding vector of the input vocabulary and the third language weight embedding vector.

The claims relate to a human organizing of ideas. 
This reads on a human:
comparing and determining by pen and paper how similar the words of two received (i.e., from written text or utterance) sentences are (e.g., similarity measure – weight vector);
 writing down a new list of the words [i.e., embedding vector] in one of the received languages using the calculated similarity [i.e., weight vector] between sentences described before and a table of words (e.g., dictionary) [i.e., embedding matrix]. ; 
writing down a new list of the words in one of the received languages using the received language sentence(s) and the calculated similarity between sentences described before.  No additional limitations are present. 	

The independent claim 10 recites: 
a source language encoder configured to model a sentence from a source language embedding vector that is an expression of each token of a token sequence of a source language through an embedding layer;
a third language encoder configured to model a sentence from a third language embedding vector that is an expression of each token of a token sequence of a third language through an embedding layer;
a target language decoder configured to generate a target language token sequence corresponding to sentence information received from the source language encoder or the third language encoder; and
a third language decoder configured to generate a third language token sequence according to sentence information received from the source language encoder.

The claim as drafted relates to a human organizing of ideas. More specifically, this reads on a human based on predetermined set of rules/criteria (i.e., encoder/decoder model(s)) to:
interpret or organize a sentence from a list of words [i.e., embedding vector] with a fixed pre-defined length [i.e., embedding layer] of received language(s) from other human(s);
interpret or organize a sentence from a list of words [i.e., embedding vector] with a fixed pre-defined length [i.e., embedding layer] of received language(s) from a second human(s);
writing down a list of words corresponding to a target language associated with one of the received languages (i.e., translation).
writing down a list of words corresponding to a target language associated with the second received language (i.e., translation). No additional limitations are present. 	
 
With respect to claim 11, the claim recites:
generate a weight vector for a similarity between an input vocabulary of the source language and a vocabulary word of the third language;
generate a third language weight embedding vector using the weight vector and an embedding matrix of the third language vocabularies; and
generate a final embedding vector using an embedding vector of the input vocabulary and the third language weight embedding vector.

The claims relate to a human organizing of ideas. 
This reads on a human:
determining (i.e., from written text or utterance) by pen and paper how similar the words of two received sentences are (e.g., similarity measure – weight vector);
 writing down a new list of the words [i.e., third language embedding vector] in one of the received languages using the calculated similarity between sentences described before; 
writing down a new list of the words in one of the received languages using the received language sentence(s) and the calculated similarity between sentences described before.  No additional limitations are present. 	

With respect to claim 12, the claim recites:
wherein parameters of a lower layer of the source language encoder are tied to parameters of a lower layer of the third language encoder, and
parameters of a lower layer of the target language decoder are tied to parameters of a lower layer of the third language decoder.

The claims relate to an operation. 
This reads on sharing data from a portion of an encoder/decoder from one language with a portion of another encoder/decoder of another language, wherein these relationships between the encoders/decoders can be predetermined by a human. No additional limitations are present. 	

With respect to claim 13, the claim recites:
wherein the lower layer is distinguished by a boundary set according to a result of monitoring a trend of an automatic translation performance change.

The claims relate to an operation. 
This reads on a portion of an encoder/decoder from one language being limited based on a change in performance of translations by a  human. No additional limitations are present.
Claims 10-13 are also rejected under 35 U.S.C. 101 because the claims appear to be directed to a software embodiment and not to hardware embodiment, where a machine claim is directed towards a system, apparatus, or arrangement. 
The claim appears to be directed towards a software embodiment. [0086] of the Specification describes the elements of the system being implemented as software alone actualizing the embodiments of the invention. For example, [“Various techniques described herein may be implemented as digital electronic circuitry, or as computer hardware, firmware, software, or combinations thereof. The techniques may be implemented as a computer program product, i.e., a computer program tangibly embodied in an information carrier, e.g., in a machine-readable storage device (for example, a computer-readable medium) or in a propagated signal for processing by, or to control an operation of a data processing apparatus, e.g., a programmable processor, a computer, or multiple computers. A computer program(s) may be written in any form of a programming language, including compiled or interpreted languages and may be deployed in any form including a stand-alone program or a module, a component, a subroutine, or other units suitable for use in a computing environment. A computer program may be deployed to be executed on one computer or on multiple computers at one site or distributed across multiple sites and interconnected by a communication network.” 
The claimed limitations are capable of being performed as software as described in the above paragraphs, alone since no hardware component is being claimed. Software, alone, are not physical components and thus are not statutory since software do not define any structural and functional interrelationships between the computer programs and other claimed elements of a computer, which permit the computer's program functionality to be realized. Hence, the stated functions comprise software and is thus not directed to a hardware embodiment. Data structures not claimed as embodied in computer readable media are descriptive material per se and are not statutory because they are not capable of causing functional change in the computer. See e.g., Warmerdam, 33 F.3d at 1361, 31, USPQ2d at 1760 (claim to a data structure per se held nonstatutory). Such claimed data structures do not define any structural and functional interrelationships between data and other claimed aspects of the invention, which permit the data structure's functionality to be realized.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


Claim 1 is rejected under 35 U.S.C. 102(a)(1) as being anticipated by Firat et al. (Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).; https://aclanthology.org/D16-1026.pdf).
As to independent claim 1, Firat et al. teaches:
An apparatus for automatic translation (see 5.1 Pivot-based Translation (par. 1): “we translate a source sentence (Es) into a pivot language (En) and then translate the English translation into a target language (Fr), all within the same multi-way, multilingual model) comprising:
an inputter configured to receive a source language, which is a low-resource language, and a third language abundant in resources compared to the low-resource language (see 5.1 Pivot-based Translation (par. 1): “The first set of approaches exploits the fact that the target zero-resource translation path can be decomposed into a sequence of high-resource translation paths (Wu and Wang, 2007; Utiyama and Isahara, 2007; Habash and Hu, 2009). For instance, in our case, Es→Fr can be decomposed into a sequence of Es→En and En→Fr. In  other words, we translate a source sentence (Es) [i.e., source/low-resource language] into a pivot language (En) [i.e., third/high-resource language]  and then translate the English translation into a target language (Fr) [i.e., target language], all within the same multi-way, multilingual model trained by using bilingual corpora.”);
a memory configured to store a program which performs automatic translation between the source language, which is the low-resource language, and a target language using the third language (see 4.2 Models and Training (par.1): “We start from the code made publicly available as a part of (Firat et al., 2016)1 . We made two changes to the original code [i.e., program stored in memory]. First, we replaced the decoder with the conditional gated recurrent network with the attention mechanism as outlines in (Firat and Cho, 2016). Second, we feed a binary indicator vector of which encoder(s) the source sentence was processed by to the output layer of each decoder (gmw in Eq. (4)). Each dimension of the indicator vector corresponds to one source language, and in the case of multi-source translation, there may be more than one dimensions set to 1. We train the following models: four single-pair models (Es↔En and Fr↔En) and one multi-way, multilingual model (Es,Fr,En↔Es,Fr,En) [i.e., automatic translations].” and 5.1 Pivot-based Translation (par. 2):“…Both approaches described and proposed above do not require any additional action on an already trained multilingual model. They are simply different translation strategies specifically aimed at zero resource translation.” ); and
a processor configured to execute the program, wherein the processor performs the automatic translation using a third language vocabulary embedding vector (see 4.2 Models and Training (par.1) and 5.1 Pivot-based Translation (par. 2) citations as in limitation above, Table 5:“ Zero-resource translation from Spanish (Es) to French (Fr) with finetuning. When pivot is √ , English is used as a pivot language.” [i.e., Experiments results (inherent use of processing device)], and 2.1 Model Description (par. 2): “Encoder An encoder for the n-th source language reads a source sentence X = (x1, . . . , xTx ) as a sequence of linguistic symbols and returns a set of context vectors Cn ={h1n, …, hTn} [i.e., C, the encoded/embeded vector of source sequence, X]. 3.2 Many-to-One Translation (par. 5): In this section, we consider a case where a source sentence is given in two languages, X1 and X2 [i.e., X1 (i.e., source language sentence) and X2 (i.e., third language sentence) interpreted to be in the same format as the already defined: X = (x1, . . . , xTx )].…At each time t, each translation path computes the distribution over the target vocabulary, i.e., p(yt = w|y<t, X1) and p(yt = w|y<t, X2). [i.e., X2, third language vocabulary embedded vector]”).

Claim 10 and 12 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Zhang et el. (Zhang, Jinchao, Qun Liu, and Jie Zhou. "ME-MD: An effective framework for neural machine translation with multiple encoders and decoders." Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017. (https://www.ijcai.org/proceedings/2017/0474.pdf)

As to independent claim 10, Zhang et al. teaches:
An apparatus for automatic translation, comprising:
a source language encoder configured to model a sentence from a source language embedding vector that is an expression of each token of a token sequence of a source language through an embedding layer (see 3.2 M-Encoder (par. 1-3): “Three encoders produce three source representations as: {h11; … ; h1m} [e.g., source language]; {h21;…; h2m}; {h31; …; h3m}:” and Figure 2: encoder 1 (i.e., source language) Length m: implies embedding layer (i.e., fixed length).);
a third language encoder configured to model a sentence from a third language embedding vector that is an expression of each token of a token sequence of a third language through an embedding layer (see 3.2 M-Encoder (par. 1-3): “Three encoders produce three source representations as: {h11; … ; h1m}; {h21;…; h2m} [e.g., third language]; {h31; …; h3m}:” Figure 2: encoder 2 (i.e., third language) Length m: implies embedding layer (i.e., fixed length).););
a target language decoder configured to generate a target language token sequence corresponding to sentence information received from the source language encoder or the third language encoder (see 2. Neural Machine Translation: 2 Neural Machine Translation: We briefly introduce the NMT architecture [Bahdanau et al., 2015] that our systems build on. Formally, given a source sentence x = x1; …; xm and a target sentence y = y1; … ; yn,) [i.e., e(y-1): associated with target word embedding] […] and 3.3 M-Decoder: “The M-Decoder aims to enhance the generation ability of the decoder through integrating multiple decoders. Similar to the M-Encoder, the M-Decoder can also have multidepth and multi-type. […] Wu et al., 2016] and multiple recurrent networks. Figure 3 presents a multi-depth M-Decoder that contains three decoders with different depths. and Figure 3: decoders’ outputs: qn,t (which depends on vector yt-1 as seen in Eq. 11: 
    PNG
    media_image1.png
    43
    354
    media_image1.png
    Greyscale
, associated with target sentence (i.e., associated with token sequence/embedding).) From Figure 3, decoder 1 is interpreted to be associated with the target language and decoder 2 with the third language.); and
a third language decoder configured to generate a third language token sequence according to sentence information received from the source language encoder (see 2. Neural Machine Translation: 2 Neural Machine Translation, .3 M-Decoder, and Figure 3 citations as in limitation above. From Figure 3, decoder 1 is interpreted to be associated with the target language and decoder 2 with the third language.).

Regarding claim 12, Zhang et al. teach all of the limitations as in claim 10, above.
Zhang et al. further teaches:
wherein parameters of a lower layer of the source language encoder are tied to parameters of a lower layer of the third language encoder (see Figure 2: Here, word embeddings (e(x1), … e(xm)) are associated with the source/third language encoder parameters. Also, associated with encoders 1-3’s lower layers: LGRU11,…LGRU11 (i.e., source language encoder lower layer) and LGRU31,…LGRU31 (i.e., third language encoder lower layer).), and 
parameters of a lower layer of the target language decoder are tied to parameters of a lower layer of the third language decoder (see Figure 3: Here, e(yt-1) are associated with target/third language decoder parameters. While decoder 1 is associated with target language and decoder 2 with third language. Also, associated with decoders 1-3’s lower layers: LGRU11 (target language encoder lower layer) and LGRU2 (third language encoder lower layer).).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2 is rejected under 35 U.S.C. 103 as being unpatentable over Firat et al. (Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).; https://aclanthology.org/D16-1026.pdf) as applied to claim 1 above, and further in view of Feng et al.  (Feng, Xiaocheng, et al. "Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer." IJCAI. Vol. 1. 2018.; https://www.ijcai.org/proceedings/2018/566). 

Regarding claim 2, Firat et al. teaches all of the limitations as in claim 1, above.
However, Firat et al. does not explicitly teach, but Feng et al. does teach:
wherein the processor expresses each token of a source language token sequence and a third language token sequence input to the inputter into an embedding vector through an embedding layer and models a meaning and a structure of a sentence (see Figure 2 and ¶ 1-3 of 2.2 Improved with Bilingual Lexicon: “We present an overview of the developed networks for modeling bilingual lexicons, as illustrated in the right of Figure 2. Following the same setting in Section 2.1, given a low resource language sentence X = {x1,x2,…,xi,…,xn}, we assume that each word xi has a corresponding high resource language translation Ti based on the bilingual lexicon 2. […] We also map each high resource language translation word into its embedding vector. Therefore, translation word vectors {t1,…,ti,…,tl} are stacked and regarded as the translation memory unit T                         
                            ∈
                        
                     Rdxl, where l is the number of all translation words. […] To better encode the structural information [i.e., structure of a sentence] of different translation items [i.e., third language token sequence], we incorporate the POS-tag information of each translation item into their corresponding translation words. […] In detail, taking an external translation unit T                         
                            ∈
                        
                     Rdxl [i.e., third language token sequence] and a low resource word vector xi                         
                            ∈
                        
                     Rd [i.e., source language token sequence] as input, the attention model outputs a continuous vector vec                         
                            ∈
                        
                     Rd, […] For each piece of translation memory tj , we use a feed forward neural network to compute its semantic relatedness [i.e., meaning] with the low resource word.”).
Firat et al. and Feng et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Firat et al. to incorporate the teachings of Feng et al. of wherein the processor expresses each token of a source language token sequence and a third language token sequence input to the inputter into an embedding vector through an embedding layer and models a meaning and a structure of a sentence which provides the benefit of improving low resource word representations via knowledge transfer from high resource language using bilingual lexicons (abstract of Feng et al.).

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Firat et al. (Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).; https://aclanthology.org/D16-1026.pdf) as applied to claim 1 above, and further in view of Zhang et el. (Zhang, Jinchao, Qun Liu, and Jie Zhou. "ME-MD: An effective framework for neural machine translation with multiple encoders and decoders." Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017. (https://www.ijcai.org/proceedings/2017/0474.pdf)

Regarding claim 5, Firat et al. teach all of the limitations as in claim 1, above.
However, Firat et al. does not explicitly teach, but Zhang et al. does teach:
wherein the processor allows parameters of a lower layer of a source language encoder to be tied to parameters of a lower layer of a third language encoder (see Figure 2: Here, word embeddings (e(x1), … e(xm)) are associated with the source/third language encoder parameters. Also, associated with encoders 1-3’s lower layers: LGRU11,…LGRU11 (i.e., source language encoder lower layer) and LGRU31,…LGRU31 (i.e., third language encoder lower layer). and 2. Neural Machine Translation: 2 Neural Machine Translation: We briefly introduce the NMT architecture [Bahdanau et al., 2015] that our systems build on. Formally, given a source sentence x = x1; …; xm and a target sentence y = y1; … ; yn, […] and 3.2 M-Encoder (par. 1-3): “We consider that multi-depth encoders can provide multiple level abstraction of the source sentence. Figure 2 (a) shows a multi-depth M- Encoder with three encoders, which depths are 2, 4 and 6, respectively. […] We exploit the left-to-right gated recurrent unit (LGRU) [Cho et al., 2014] to forwardly compress the source sequence […] Layers with different directions are alternately stacked with direct connections. After the input sequence is compressed by stacked GRU layers to the vector o2 = {o21; …; o2m}, a gated unit is employed to combine original word embedding e(xi) [i.e., input/third languages] […] Three encoders produce three source representations as: {h11; … ; h1m}; {h21;…; h2m}; {h31; …; h3m}:”), and 
parameters of a lower layer of a target language decoder to be tied to parameters of a lower layer of a third language decoder (see Figure 3: Here, e(yt-1) are associated with target/third language decoder parameters. While decoder 1 is associated with target language and decoder 2 with third language. Also, associated with decoders 1-3’s lower layers: LGRU11 (target language encoder lower layer) and LGRU2 (third language encoder lower layer). and 2. Neural Machine Translation: 2 Neural Machine Translation: We briefly introduce the NMT architecture [Bahdanau et al., 2015] that our systems build on. Formally, given a source sentence x = x1; …; xm and a target sentence y = y1; … ; yn,) [i.e., e(y-1): associated with target word embedding] […] and 3.3 M-Decoder: “The M-Decoder aims to enhance the generation ability of the decoder through integrating multiple decoders. Similar to the M-Encoder, the M-Decoder can also have multidepth and multi-type. […] Wu et al., 2016] and multiple recurrent networks. Figure 3 presents a multi-depth M-Decoder that contains three decoders with different depths. We take the “decoder2” for detailed description without loss of generality. We adopt the varietal decoder implementation in our NMT systems. Formally, the output q2;t of the “decoder2” at time t is computed as following 
    PNG
    media_image2.png
    90
    402
    media_image2.png
    Greyscale
 […]”).
Firat et al. and Zhang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Firat et al. to incorporate the teachings of Zhang et al. of wherein the processor allows parameters of a lower layer of a source language encoder to be tied to parameters of a lower layer of a third language encoder and parameters of a lower layer of a target language decoder to be tied to parameters of a lower layer of a third language decoder which provides the benefit of larger improvements with less parameters and saves significant computation overhead (4.5 Comparison with Deeper and Wider Networks (par. 2) of Zhang et al.).

Claim 6 are rejected 35 U.S.C. 103 as being unpatentable over Feng et al.  (Feng, Xiaocheng, et al. "Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer." IJCAI. Vol. 1. 2018.; https://www.ijcai.org/proceedings/2018/566) further in view of Mohamed et al. (US 10417350 B1).
As to independent claim 6, Feng et al. teaches:
A method of automatic translation (see 2. Methodology (par. 1): “Afterwards, we present two neural networks to learn the crosslingual semantic representation of each low resource language word based on high resource language translations.”), comprising the steps of:
(a) receiving a token sequence of a source language and a token sequence of a third language (see ¶ 2-3 of 2.2 Improved with Bilingual Lexicon: “[…] To better encode the structural information [i.e., structure of a sentence] of different translation items [i.e., third language token sequence], we incorporate the POS-tag information of each translation item into their corresponding translation words. […] In detail, taking an external translation unit T                         
                            ∈
                        
                     Rdxl [i.e., third language token sequence] and a low resource word vector xi                         
                            ∈
                        
                     Rd [i.e., source language token sequence] as input”);
(b) expressing each token of the token sequences into an embedding vector (see ¶ 1 of 2.2 Improved with Bilingual Lexicon: “given a low resource language sentence X = {x1, x2….,xi,…xn} [i.e., source language embedding vector], we assume that each word xi has a corresponding high resource language translation Ti based on the bilingual lexicon 2. […] Therefore, translation word vectors {t1, ….,ti,…tn} [i.e., third language embedding vector] are stacked and regarded as the translation memory unit T...”); and

However, Feng et al. does not explicitly teach, but Mohamed et al. does teach:
(c) generating a target language token sequence using the embedding vector and outputting the generated target language token sequence (see Figure Col. 2 line 65- Col 3 line 21: “(18) As mentioned above, in some embodiments, embedding vectors for words or tokens of different languages may be obtained and used for similarity analysis. Individual words or groups of words in a given language may be mapped to data structures which represent the corresponding semantics numerically […] In one embodiment, data structures other than vectors may be used for representing the words, and distance metrics may be computed for such other data structures to indicate similarity. In some embodiments, machine translation algorithms, which may also employ neural networks in some cases, may be used to translate the tokens of the ITCs into tokens in the target language, and such machine-translated tokens may be used for the similarity analysis.”).
Feng et al. and Mohamed et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. to incorporate the teachings of Mohamed et al. of (c) generating a target language token sequence using the embedding vector and outputting the generated target language token sequence which provides the benefit of helping to increase client confidence in the accuracy of the analysis (Col. 19, lines 1-11 of Mohamed et al.).

Claim 7 is rejected 35 U.S.C. 103 as being unpatentable over Feng et al.  (Feng, Xiaocheng, et al. "Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer." IJCAI. Vol. 1. 2018.; https://www.ijcai.org/proceedings/2018/566) further in view of Mohamed et al. (US 10417350 B1) as in claim 6 above and further in view of Firat et al. (Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016).

Regarding claim 7, Feng et al. in combination with Mohamed et al. teach all of the limitations as in claim 6, above.

However, Feng et al. in combination with Mohamed et al. does not explicitly teach, but Firat et al. does teach:
wherein the step (a) includes receiving the token sequence of the source language, which is a low-resource language, and the token sequence of the third language abundant in resources compared to the low-resource language (see 5.1 Pivot-based Translation (par. 1): “The first set of approaches exploits the fact that the target zero-resource translation path can be decomposed into a sequence of high-resource translation paths (Wu and Wang, 2007; Utiyama and Isahara, 2007; Habash and Hu, 2009). For instance, in our case, Es→Fr can be decomposed into a sequence of Es→En and En→Fr. In  other words, we translate a source sentence (Es) [i.e., source/low-resource language] into a pivot language (En) [i.e., third/high-resource language]  and then translate the English translation into a target language (Fr) [i.e., target language], all within the same multi-way, multilingual model trained by using bilingual corpora.” and see 3.2 Many-to-One Translation (par. 5): In this section, we consider a case where a source sentence is given in two languages, X1 and X2 [i.e., X1 and X2 interpreted to be in the same format as the already defined: X = (x1, . . . , xTx )]).
Feng et al. in combination with Mohamed et al. and Firat et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. in combination with Mohamed et al. to incorporate the teachings of Firat et al. of wherein the step (a) includes receiving the token sequence of the source language, which is a low-resource language, and the token sequence of the third language abundant in resources compared to the low-resource language which provides the benefit of the translation quality can be improved even without any direct parallel corpus available, and if there is a small amount of direct parallel pairs available, the quality may improve even further (6.2.2 Results and Analysis (par. 4) of Firat et al.).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Feng et al.  (Feng, Xiaocheng, et al. "Improving Low Resource Named Entity Recognition using Cross-lingual Knowledge Transfer." IJCAI. Vol. 1. 2018.; https://www.ijcai.org/proceedings/2018/566) further in view of Mohamed et al. (US 10417350 B1) and Firat et al. (Firat, Orhan, et al. "Zero-resource translation with multi-lingual neural machine translation." arXiv preprint arXiv:1606.04164 (2016) as applied to claim 7 above, and Zhang et el. (Zhang, Jinchao, Qun Liu, and Jie Zhou. "ME-MD: An effective framework for neural machine translation with multiple encoders and decoders." Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017; https://www.ijcai.org/proceedings/2017/0474.pdf)

Regarding claim 8, Feng et al. in combination with Mohamed et al. (US 10417350 B1) teach all of the limitations as in claim 7, above.
However, Feng et al. in combination with Mohamed et al. and Firat et al. does not explicitly teach, but Zhang et al. does teach:
wherein the step (b) includes allowing parameters of lower layers of encoder and decoder networks for modeling the source language to be tied to parameters of lower layers of encoder and decoder networks for modeling the third language (see Figures 2 and 3 as in claim 5, above.
Here, Encoders: word embeddings (e(x1), … e(xm)) are associated with the source/third language encoder parameters. Also, associated with encoders 1-3’s lower layers: LGRU11,…LGRU11 (i.e., source language encoder lower layer) and LGRU31,…LGRU31 (i.e., third language encoder lower layer). Therefore, considering a relationship (i.e., tie) between the source and third language parameters, the following implies a relationship with the decoding part as well. Decoders: e(yt-1) are associated with target/third language decoder parameters. While decoder 1 is associated with target language and decoder 2 with third language. Also, associated with decoders 1-3’s lower layers: LGRU11 (target language encoder lower layer) and LGRU2 (third language encoder lower layer). and 2. Neural Machine Translation: 2 Neural Machine Translation: We briefly introduce the NMT architecture [Bahdanau et al., 2015] that our systems build on. Formally, given a source sentence x = x1; …; xm and a target sentence y = y1; … ; yn, […] and 3.2 M-Encoder (par. 1-3): “We consider that multi-depth encoders can provide multiple level abstraction of the source sentence. Figure 2 (a) shows a multi-depth M- Encoder with three encoders, which depths are 2, 4 and 6, respectively. […] We exploit the left-to-right gated recurrent unit (LGRU) [Cho et al., 2014] to forwardly compress the source sequence […] Layers with different directions are alternately stacked with direct connections. After the input sequence is compressed by stacked GRU layers to the vector o2 = fo21; :::; o2mg, a gated unit is employed to combine original word embedding e(xi) [i.e., input/third languages] […] Three encoders produce three source representations as: {h11; … ; h1m}; {h21;…; h2m}; {h31; …; h3m}: and 3.3 M-Decoder: “The M-Decoder aims to enhance the generation ability of the decoder through integrating multiple decoders. Similar to the M-Encoder, the M-Decoder can also have multidepth and multi-type. […] Wu et al., 2016] and multiple recurrent networks. Figure 3 presents a multi-depth M-Decoder that contains three decoders with different depths. We take the “decoder2” for detailed description without loss of generality. We adopt the varietal decoder implementation in our NMT systems. Formally, the output q2;t of the “decoder2” at time t is computed as following 
    PNG
    media_image2.png
    90
    402
    media_image2.png
    Greyscale
 […]”
Also, see Figure 1: The general architecture of the proposed ME-MD framework. The architecture consists of two modules: M-Encoder and M-Decoder [multiple encoders and multiple decoders]. Compared with the encoder-decoder framework, MEMD exploits multiple encoders and decoders. Here, a relationship [i.e., tie] between the source and third languages lower layers in both the encoder and decoder modules.).
Feng et al. in combination with Mohamed et al. and Firat et al.and Zhang et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Feng et al. in combination with Mohamed et al. and Firat et al. to incorporate the teachings of Zhang et al. of wherein the step (b) includes allowing parameters of lower layers of encoder and decoder networks for modeling the source language to be tied to parameters of lower layers of encoder and decoder networks for modeling the third language which provides the benefit of larger improvements with less parameters and saves significant computation overhead (4.5 Comparison with Deeper and Wider Networks (par. 2) of Zhang et al.).

Claim 13 is rejected under 35 U.S.C. 103 as being unpatentable over Zhang et el. (Zhang, Jinchao, Qun Liu, and Jie Zhou. "ME-MD: An effective framework for neural machine translation with multiple encoders and decoders." Proceedings of the 26th International Joint Conference on Artificial Intelligence. 2017; https://www.ijcai.org/proceedings/2017/0474.pdf) , as applied to claim 12 above, and further in view of Molin et al. (US 20210334470 A1).

Regarding claim 13, Zhang  et al. in teach all of the limitations as in claim 12, above.
However, Zhang et al. in does not explicitly teach, but Molin et al. does teach:
wherein the lower layer is distinguished by a boundary set according to a result of monitoring a trend of an automatic translation performance change (see [0029]: “If the neural network outputs a word in the source language giving it a meaning that is deprecated, this too can signal to the expert the need for making changes in one of the neural network layers in order to create a neuron, which stores the meaning of the context against which, the accuracy of the translation may be compared.”).
Zhang et al. and Molin et al. are considered to be analogous to the claimed invention because they are in the same field of endeavor in language processing. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Zhang et al. to incorporate the teachings of Molin et al. of wherein the lower layer is distinguished by a boundary set according to a result of monitoring a trend of an automatic translation performance change which provides the benefit of the improvement of the quality of the interpretation of sentences of a natural language ([0073] of Molin et al.). 

Allowable Subject Matter
Claims 3-4, 9, and 11  would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 101 set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

The following is a statement of reason for the indication of allowable subject matter:
Regarding claim 3, Firat et al. in combination with Feng et al. teach of all the limitations as in claim 2, above.
However, with respect to Claim 3, Firat et al. in combination with Feng et al. fail to teach:
wherein the processor:
calculates a weight vector by measuring a distance between an input embedding vector and the third language vocabulary embedding vector, and
generates a third language weight embedding vector using the weight vector and an embedding matrix of third language vocabularies.
Claim 4 would be allowable because it is dependent on claim 3.

Regarding claim 9, Feng et al. in combination with Firat et al. teach all of the limitations as in claim 6.
However, with respect to Claim 9, Feng et al. in combination with Firat et al. fail to teach:
wherein the step (b) includes:
calculating a weight vector for a similarity between an input vocabulary and a third language vocabulary, generating a third language weight embedding vector using the weight vector and an embedding matrix of the third language; and
generating a final embedding vector using an embedding vector of the input vocabulary and the third language weight embedding vector.

Regarding claim 11, Zhang et al. teach all of the limitations as in claim 10.
However, Zhang et al. fail to teach:
the apparatus further comprising a vocabulary embedding mapping module configured to:
generate a weight vector for a similarity between an input vocabulary of the source language and a vocabulary word of the third language;
generate a third language weight embedding vector using the weight vector and an embedding matrix of the third language vocabularies; and
generate a final embedding vector using an embedding vector of the input vocabulary and the third language weight embedding vector.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Keisha Y Castillo-Torres whose telephone number is (571)272-3975. The examiner can normally be reached Monday - Friday, 9:00 am - 4:00 pm (EST).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Pierre-Louis Desir can be reached on (571)272-7799. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

Keisha Y. Castillo-Torres
Examiner
Art Unit 2659



/Keisha Y. Castillo-Torres/Examiner, Art Unit 2659                                                                                                                                                                                                        

/Paras D Shah/Primary Examiner, Art Unit 2659                                                                                                                                                                                                        

08/13/2022