DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Acknowledgment is made of applicant's claim for foreign priority based on an application filed in Republic of Korea on 11/29/2019. It is noted, however, that applicant has not filed a certified copy of the 10-2019-0156748 application as required by 37 CFR 1.55.  The application data sheet filed 11/25/2020 includes the PDX access code ‘B69F’, but the examiner is unable to locate the certified copy in the electronic file wrapper, so it appears that retrieval of the associated foreign application was not successful.
Drawings
The drawings are objected to because:
In Figure 5, element 926, “USER INTERFACE INTPUT DEVICE” should read “USER INTERFACE INPUT DEVICE”.
The drawings are objected to as failing to comply with 37 CFR 1.84(p)(5) because they include the following reference characters not mentioned in the description: 900 and 930 in Figure 5.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” pursuant to 37 CFR 1.121(d). If the changes are not accepted by the examiner, the applicant will be notified and informed of any required corrective action in the next Office action. The objection to the drawings will not be held in abeyance.
Specification
The disclosure is objected to because of the following informalities:
In paragraph 0034, line 2, the equation is not legible.
In paragraph 0075, line 1, “the prevent invention” should read “the present invention”.
In paragraph 0076, lines 1-2, “the prevent invention” should read “the present invention”.
Appropriate correction is required.
Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation is: “an inputter” in claim 1.
Because this claim limitation is being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it is being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.
Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1 – 8 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
Regarding claim 1, the disclosure does not provide adequate structure to perform the claimed function of “an inputter configured to receive a first language input token”. The specification, in paragraph 0020, lines 1-3, recites “A system 100 for end-to-end neural machine translation according to the present invention includes an inputter 110 configured to receive first language input tokens”. However, the specification does not disclose a method for receiving first language input tokens or disclose the structure that performs the function.  Therefore, the specification does not demonstrate that applicant has made an invention that achieves the claimed function because the invention is not described with sufficient detail such that one of ordinary skill in the art can reasonably conclude that the inventor had possession of the claimed invention.
Claims 2 – 8 are also rejected as they depend from claim 1, and thus recite the limitations of claim 1, and do not resolve the lack of adequate structure to perform the claimed function of “an inputter configured to receive a first language input token”.
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 1 – 8 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Regarding claim 1, the claim limitation “an inputter” invokes 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. However, the written description fails to disclose the corresponding structure, material, or acts for performing the entire claimed function and to clearly link the structure, material, or acts to the function.  No association between the structure and the functions can be found in the specification.  The specification discloses the “inputter” as performing the function of receiving first language input tokens.  The hardware configuration example shown in figure 5 and described in the specification is not adequate structure for performing the function of receiving first language input tokens because it does not describe a particular structure for performing the function.  As would be recognized by those of ordinary skill in the art, the function of receiving first language input tokens can be performed in any number of ways in hardware, software or a combination of the two. The specification does not provide sufficient details such that one of ordinary skill in the art would understand the structure that performs the claimed function.  Therefore, the claim is indefinite and is rejected under 35 U.S.C. 112(b) or pre-AIA  35 U.S.C. 112, second paragraph.
Applicant may:
(a)        Amend the claim so that the claim limitation will no longer be interpreted as a limitation under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph; 
(b)        Amend the written description of the specification such that it expressly recites what structure, material, or acts perform the entire claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(c)        Amend the written description of the specification such that it clearly links the structure, material, or acts disclosed therein to the function recited in the claim, without introducing any new matter (35 U.S.C. 132(a)).
If applicant is of the opinion that the written description of the specification already implicitly or inherently discloses the corresponding structure, material, or acts and clearly links them to the function so that one of ordinary skill in the art would recognize what structure, material, or acts perform the claimed function, applicant should clarify the record by either: 
(a)        Amending the written description of the specification such that it expressly recites the corresponding structure, material, or acts for performing the claimed function and clearly links or associates the structure, material, or acts to the claimed function, without introducing any new matter (35 U.S.C. 132(a)); or 
(b)        Stating on the record what the corresponding structure, material, or acts, which are implicitly or inherently set forth in the written description of the specification, perform the claimed function. For more information, see 37 CFR 1.75(d) and MPEP §§ 608.01(o) and 2181.
Claims 2 – 7 are also rejected as they depend from claim 1, and thus recite the limitations of claim 1, and do not resolve the indefinite language from claim 1.
Regarding claim 8, claim 8 depends from claim 1, and thus recites the limitations of claim 1, and does not resolve the indefinite language from claim 1.  Also, the claim limitation “a delta probability distribution of a READ action” is indefinite because the disclosure does not provide sufficient information to understand how the term “a delta probability distribution” is being applied.  For examination purposes, the term “a delta probability distribution of a READ action, a probability of a READ action, and a probability of a WRITE action” in claim 8 will be interpreted to mean the probability of a READ action and the probability of a WRITE action.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1 and 4 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  
Claim 1 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  The claim recites “an inputter configured to receive a first language input token; a memory in which a real time interpretation and translation program for the first language input token is stored; and a processor configured to execute the program, wherein the processor combines an output of a translation network with an output of an action network to compose a final translation result in communication units”.
The claim 1 limitations, under their broadest reasonable interpretation, cover performance of the limitation in the mind but for the recitation of generic computer components.  That is, other than reciting “an inputter”, “a memory”, “a processor”, “a translation network”, and “an action network”, nothing in the claim elements preclude the actions from practically being performed in the mind.  For example, “receive” in the context of this claim encompasses a person reading text in one language, and “combines” in the context of this claim encompasses the person writing a translation of the text in a second language, while considering after each word if they have enough information to write the next word of the translation, or if they need to read an additional word or words before writing the next word of the translation.
If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind but for the recitation of generic computer components, then it falls within the “Mental Processes” grouping of abstract ideas.  Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application.  In particular, the claim recites the additional elements of “an inputter”, “a memory”, “a processor”, “a translation network”, and “an action network”.  These elements are recited at a high level of generality such that they amount to no more than mere instructions to apply the exception using generic components.  Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  The claim is directed to an abstract idea.   
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “an inputter”, “a memory”, “a processor”, “a translation network”, and “an action network” amount to no more than mere instructions to apply the exception using generic components.  Mere instructions to apply an exception using generic components cannot provide an inventive concept.  The claim is not patent eligible.
Claim 4 is rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.  Claim 4 depends from claim 1, and thus recites the limitations of claim 1, wherein “the action network determines whether to further read the first language input token or generate a second language output token on the basis of translation information having been input and output so far”.
For the reasons discussed above for claim 1, the claim 1 limitations recite abstract ideas.  The additional limitation does not preclude the steps of claim 1 from practically being performed in the mind.  For example, a person translating a text written in one language to a second language, while considering after each word if they have enough information to write the next word of the translation, or if they need to read an additional word or words before writing the next word of the translation, could make that consideration based on all of the text they have read and written at that time.
This judicial exception is not integrated into a practical application.  For the reasons discussed above for claim 1, the additional elements of “an inputter”, “a memory”, “a processor”, “a translation network”, and “an action network” amount to no more than mere instructions to apply the exception using generic components.  Accordingly, these elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.  
The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception.  For the reasons discussed above for claim 1, mere instructions to apply an exception using generic components cannot provide an inventive concept.  The claim is not patent eligible.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 9 – 12 and 14 – 15 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Gu et al. (“Learning to Translate in Real-time with Neural Machine Translation”), hereinafter Gu.
Regarding claim 9, Gu discloses a method of end-to-end neural machine translation, comprising the steps of:
(a) adding a READ token and performing learning on an end-to-end neural machine translation network (Section 4.1, lines 1-7, "We need an NMT environment for the agent to explore and use to generate translations. Here, we simply pre-train the NMT encoder-decoder on full sentence pairs with maximum likelihood, and assume the pre-trained model is still able to generate reasonable translations even on incomplete source sentences."; Section 2, lines 1-11, "Suppose we have a buffer of input words X = {x1, ..., xTs} to be translated in real-time. We define the simultaneous translation task as sequentially making two interleaved decisions: READ or WRITE. More precisely, the translator READs a source word xƞ from the input buffer in chronological order as translation context, or WRITEs a translated word yτ onto the output buffer, resulting in output sentence Y = {y1, ..., yTt}, and action sequence A = {a1, ..., aT} consists of Ts READs and Tt WRITEs, so T = Ts + Tt."; Section 4, lines 1-2, "The proposed framework can be trained using reinforcement learning."; The READ action reads on the READ token, the neural machine translation (NMT) encoder-decoder reads on the end-to-end neural machine translation network, and training reads on performing learning.);
(b) performing learning on an action network to learn a position of an actual segmentation point (Section 3.2, lines 16-17, "How the agent chooses the actions based on the observation defines the policy."; Section 4.3, lines 1-3, "We freeze the pre-trained parameters of an NMT model, and train the agent using the policy gradient"; The agent choosing the actions reads on the action network, and choosing the actions based on the observation reads on determining a position of an actual segmentation point.);
and (c) performing entire network re-learning on the end-to-end neural machine translation network and the action network (Section 4.3, lines 41-45, "The overall learning algorithm is summarized in Algorithm 2. For efficiency, instead of updating with stochastic gradient descent (SGD) on a single sentence, both the agent and the baseline are optimized using a minibatch of multiple sentences."; The overall learning algorithm reads on performing entire network re-learning, and the agent and the baseline read on the action network and the translation network.).
Regarding claim 10, Gu discloses the method as claimed in claim 9, wherein the step (a) includes performing learning on the end-to-end neural machine translation network having an encoder-decoder structure to which an attention mechanism is coupled (Section 3.1, lines 1-4, "The first element of the NMT system is the encoder, which converts input words X = {x1, ..., xTs} into context vectors H = {h1, ..., hTs}."; Section 3.1, lines 12-13, "Similar with standard MT, we use an attention-based decoder.").
Regarding claim 11, Gu discloses the method as claimed in claim 9, wherein the step (a) includes adding READ tokens corresponding in number to a length of a first language sentence at arbitrary positions of a second language sentence of training data to generate an action sequence (Section 2, lines 1-11, "Suppose we have a buffer of input words X = {x1, ..., xTs} to be translated in real-time. We define the simultaneous translation task as sequentially making two interleaved decisions: READ or WRITE. More precisely, the translator READs a source word xƞ from the input buffer in chronological order as translation context, or WRITEs a translated word yτ onto the output buffer, resulting in output sentence Y = {y1, ..., yTt}, and action sequence A = {a1, ..., aT} consists of Ts READs and Tt WRITEs, so T = Ts + Tt.").
Regarding claim 12, Gu discloses the method as claimed in claim 9, wherein the step (b) includes determining whether to further read a first language input token or generate a second language output token on the basis of translation information having been input and output (Section 3.2, lines 5-9, "As shown in Fig 2, we concatenate the current context vector                         
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    , the current decoder state                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     and the embedding vector of the candidate word                         
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     as the continuous observation,                         
                            
                                
                                    o
                                
                                
                                    τ
                                    +
                                    ƞ
                                
                            
                        
                     = [                        
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ;                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ; E(                        
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    )] to represent the current state."; Section 3.2, lines 16-17, "How the agent chooses the actions based on the observation defines the policy."; Section 3.2, lines 24-27, "Based on the policy of our agent, the overall algorithm of greedy decoding is shown in Algorithm 1. The algorithm outputs the translation result and a sequence of observation-action pairs."; Outputting the translation result and a sequence of observation-action pairs reads on determining whether to further read a first language input token or generate a second language output token, and the continuous observation reads on the translation information having been input and output).
Regarding claim 14, Gu discloses the method as claimed in claim 9, wherein the step (b) includes fixing a probability distribution of output token generation and learning a probability of a READ action (Section 4.3, lines 1-3, "We freeze the pre-trained parameters of an NMT model, and train the agent using the policy gradient"; Section 3.1, lines 20-24, "Hƞ is used to represent the incomplete input states, where Hƞ is a prefix of H. As the WRITE action calculates the probability of the next word on the fly, we need greedy decoding for each step"; Section 3.2, lines 16-17, "How the agent chooses the actions based on the observation defines the policy."; Freezing the pre-trained parameters of an NMT model reads on fixing a probability distribution of output token generation, and training the agent reads on learning a probability of a READ action.).
Regarding claim 15, Gu discloses the method as claimed in claim 14, wherein the step (b) includes learning on the action network through a reinforcement learning using a second language sentence and a second language token sequence (Section 4.2, lines 6-7, "We evaluate the translation quality using metrics such as BLEU"; Section 4.2, lines 15-20, "we used a smoothed version of BLEU for our implementation (Lin and Och, 2004). BLEU(Y, Y*) = BP · BLEU0(Y, Y*), where Y* is the reference and Y is the output. We decompose BLEU and use the difference of partial BLEU scores as the reward"; Section 4.3, "We freeze the pre-trained parameters of an NMT model, and train the agent using the policy gradient"; Training the agent using the policy gradient reads on performing learning on the action network through reinforcement learning, the reference (Y*) reads on a second language sentence, and the output (Y) reads on a second language token sequence.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 7 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Gu in view of Ma et al. (US Patent No. 11,126,800), hereinafter Ma.
Regarding claim 1, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu discloses a system for end-to-end neural machine translation, comprising:
an inputter configured to receive a first language input token (Section 2, lines 1-2, "Suppose we have a buffer of input words X = {x1, ..., xTs} to be translated in real-time.");
wherein the processor combines an output of a translation network with an output of an action network to compose a final translation result in communication units (Abstract, lines 5-10, "We propose a neural machine translation (NMT) framework for simultaneous translation in which an agent learns to make decisions on when to translate from the interaction with a pre-trained NMT environment."; Section 1, lines 40-47, "In this paper, we propose a unified design for learning to perform neural simultaneous machine translation. The proposed framework is based on formulating translation as an interleaved sequence of two actions: READ and WRITE. Based on this, we devise a model connecting the NMT system and these READ/WRITE decisions."; The neural machine translation (NMT) system reads on the translation network and the READ/WRITE decisions read on the action network.).
Gu does not specifically disclose: a memory in which a real time interpretation and translation program for the first language input token is stored; and a processor configured to execute the program.
Ma teaches:
a memory in which a real time interpretation and translation program for the first language input token is stored; and a processor configured to execute the program (Column 18, lines 64 - Column 19, line 2, "Aspects of the present invention may be encoded upon one or more non-transitory computer-readable media with instructions for one or more processors or processing units to cause steps to be performed. It shall be noted that the one or more non-transitory computer-readable media shall include volatile and non-volatile memory."; Column 7, lines 28-31, "FIG. 5 is a flowchart of an illustrative process for using a neural network that has been trained in a prefix-to-prefix manner for low-latency real-time translation, according to various embodiments of the present disclosure."
Ma teaches a processor executing instructions stored in memory in order to implement a method of simultaneous translation with a delay of only a few seconds (Column 1, lines 23-30, "The present disclosure relates generally to systems and methods for interpretation automation. More particularly, the present disclosure relates to systems and methods for simultaneous translation with integrated anticipation and controllable latency. Simultaneous translation aims to automate simultaneous interpretation, which translates concurrently with the source-language speech, with a delay of only a few seconds.").
Gu and Ma are considered to be analogous to the claimed invention because they are in the same field of simultaneous language translation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gu to incorporate the teachings of Ma to use a processor executing instructions stored in memory.  Doing so would allow for implementing a method of simultaneous translation with a delay of only a few seconds.
Regarding claim 2, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 1.  Gu further discloses:
wherein the translation network has an encoder-decoder structure to which an attention mechanism is coupled (Section 3.1, lines 1-4, "The first element of the NMT system is the encoder, which converts input words X = {x1, ..., xTs} into context vectors H = {h1, ..., hTs}."; Section 3.1, lines 12-13, "Similar with standard MT, we use an attention-based decoder.").
Regarding claim 3, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 1.  Gu further discloses:
wherein the translation network adds a READ token at an arbitrary position of a second language sentence of training data to generate an action sequence (Section 2, lines 1-11, "Suppose we have a buffer of input words X = {x1, ..., xTs} to be translated in real-time. We define the simultaneous translation task as sequentially making two interleaved decisions: READ or WRITE. More precisely, the translator READs a source word xƞ from the input buffer in chronological order as translation context, or WRITEs a translated word yτ onto the output buffer, resulting in output sentence Y = {y1, ..., yTt}, and action sequence A = {a1, ..., aT} consists of Ts READs and Tt WRITEs, so T = Ts + Tt.").
Regarding claim 4, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 1.  Gu further discloses:
wherein the action network determines whether to further read the first language input token or generate a second language output token on the basis of translation information having been input and output so far (Section 3.2, lines 5-9, "As shown in Fig 2, we concatenate the current context vector                         
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    , the current decoder state                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     and the embedding vector of the candidate word                         
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     as the continuous observation,                         
                            
                                
                                    o
                                
                                
                                    τ
                                    +
                                    ƞ
                                
                            
                        
                     = [                        
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ;                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ; E(                        
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    )] to represent the current state."; Section 3.2, lines 16-17, "How the agent chooses the actions based on the observation defines the policy."; Section 3.2, lines 24-27, "Based on the policy of our agent, the overall algorithm of greedy decoding is shown in Algorithm 1. The algorithm outputs the translation result and a sequence of observation-action pairs."; Outputting the translation result and a sequence of observation-action pairs reads on determining whether to further read a first language input token or generate a second language output token, and the continuous observation reads on the translation information having been input and output.).
Regarding claim 5, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 1.  Gu further discloses: wherein the processor learns a position of an actual segmentation point that occurs in a real time interpretation and translation through the action network (Section 4.2, lines 1-5, "The policy is learned in order to increase a reward for the translation. At each step the agent will receive a reward signal rt based on (ot, at). To evaluate a good simultaneous machine translation, a reward must consider both quality and delay."; Learning the policy reads on learning the position of an actual segmentation point.).
Regarding claim 6, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 5.  Gu further discloses: wherein the processor performs learning on the action network through a reinforcement learning having a reward function using a second language sentence and a second language token sequence (Section 4.2, lines 6-7, "We evaluate the translation quality using metrics such as BLEU"; Section 4.2, lines 15-20, "we used a smoothed version of BLEU for our implementation (Lin and Och, 2004). BLEU(Y, Y*) = BP · BLEU0(Y, Y*), where Y* is the reference and Y is the output. We decompose BLEU and use the difference of partial BLEU scores as the reward"; Section 4.3, "We freeze the pre-trained parameters of an NMT model, and train the agent using the policy gradient"; Training the agent using the policy gradient reads on performing learning on the action network through reinforcement learning, the reference (Y*) reads on a second language sentence, and the output (Y) reads on a second language token sequence.).
Regarding claim 7, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 1.  Gu further discloses: wherein the action network outputs a probability of a READ action using a context vector (Section 3.1, lines 1-4, "The first element of the NMT system is the encoder, which converts input words X = {x1, ..., xTs} into context vectors H = {h1, ..., hTs}."; Section 3.1, lines 20-24, "Hƞ is used to represent the incomplete input states, where Hƞ is a prefix of H. As the WRITE action calculates the probability of the next word on the fly, we need greedy decoding for each step").
Gu does not specifically disclose: a hidden state vector.
Ma further teaches:
a hidden state vector (Column 5, lines 26-31, "Regardless of the particular design of different seq-to-seq models, the encoder always takes the input sequence x=(x1, . . . , xn) where each xi ∈ Rdx is a word embedding of dimensions, and produces a new sequence of hidden states h=ƒ(x)=(h1, . . . , hn). The encoding function ƒ can be implemented by RNN or Transformer.").
Ma teaches encoding the input sequence in hidden state vectors in order to implement a method of simultaneous translation with a delay of only a few seconds (Column 1, lines 23-30, "The present disclosure relates generally to systems and methods for interpretation automation. More particularly, the present disclosure relates to systems and methods for simultaneous translation with integrated anticipation and controllable latency. Simultaneous translation aims to automate simultaneous interpretation, which translates concurrently with the source-language speech, with a delay of only a few seconds.").
Gu and Ma are considered to be analogous to the claimed invention because they are in the same field of simultaneous language translation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gu in view of Ma to further incorporate the teachings of Ma to encode the input sequence in hidden state vectors.  Doing so would allow for implementing a method of simultaneous translation with a delay of only a few seconds.
Regarding claim 13, Gu discloses the method as claimed in claim 12.   Gu further discloses: wherein the translation information having been input and output is expressed as an encoder context vector (Section 3.1, lines 1-4, "The first element of the NMT system is the encoder, which converts input words X = {x1, ..., xTs} into context vectors H = {h1, ..., hTs}."; Section 3.2, lines 5-9, "As shown in Fig 2, we concatenate the current context vector                         
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    , the current decoder state                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     and the embedding vector of the candidate word                         
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     as the continuous observation,                         
                            
                                
                                    o
                                
                                
                                    τ
                                    +
                                    ƞ
                                
                            
                        
                     = [                        
                            
                                
                                    c
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ;                         
                            
                                
                                    z
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    ; E(                        
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                    )] to represent the current state”; The context vector H reads on the translation information having been input expressed as an encoder context vector, and the embedding vector of the candidate word                         
                            
                                
                                    y
                                
                                
                                    τ
                                
                                
                                    ƞ
                                
                            
                        
                     reads on the translation information having been output expressed as an encoder context vector.).
Gu does not specifically disclose: a decoder hidden state vector of the end-to-end neural machine translation network.
Ma teaches:
a decoder hidden state vector of the end-to-end neural machine translation network (Column 12, lines 8-12, "Then, in embodiments, a newly defined hidden state sequence z(t) (z1(t), . . . , zn(t)) at decoding step t may be expressed as: zi(t) = Σj=1nαij(t)PWv(xj)"; The hidden state sequence z(t) reads on the decoder hidden state vector.).
Ma teaches using a decoder hidden state vector in order to implement a method of simultaneous translation with a delay of only a few seconds (Column 1, lines 23-30, "The present disclosure relates generally to systems and methods for interpretation automation. More particularly, the present disclosure relates to systems and methods for simultaneous translation with integrated anticipation and controllable latency. Simultaneous translation aims to automate simultaneous interpretation, which translates concurrently with the source-language speech, with a delay of only a few seconds.").
Gu and Ma are considered to be analogous to the claimed invention because they are in the same field of simultaneous language translation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gu to incorporate the teachings of Ma to use a decoder hidden state vector.  Doing so would allow for implementing a method of simultaneous translation with a delay of only a few seconds.
Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Gu in view of Ma as applied to claim 7 above, and further in view of Zheng et al. ("Simultaneous Translation with Flexible Policy via Restricted Imitation Learning"), hereinafter Zheng.
Regarding claim 8, as best understood based on the 35 U.S.C. 112(a) and 112(b) issues identified above, Gu in view of Ma discloses the system as claimed in claim 7, but does not specifically disclose: wherein the processor calculates a probability distribution of final token generation using a probability distribution of output token generation a delta probability distribution of a READ action, a probability of a READ action, and a probability of a WRITE action.
Zheng teaches:
wherein the processor calculates a probability distribution of final token generation using a probability distribution of output token generation (Section 2, lines 7-13, "Given a sequence x from the source language, the conventional machine translation model predicts the probability distribution of the next target word yj at the j-th step, conditioned on the full source sequence x and previously generated target words y<j, that is p(yj | x; y<j)."),
a delta probability distribution of a READ action, a probability of a READ action, and a probability of a WRITE action (Section 3, lines 1-3, "To obtain a flexible and adaptive policy, we need our model to be able to take both READ and WRITE actions."; Section 4.1, lines 61-66, "If an action sequence a is obtained from our oracle, then applying this sequence will result in a prefix pair, say sa and ta, of x and y. Let p(a | sa; ta) be the probability of choosing action a given the prefix pair obtained by applying action sequence a.").
Zheng teaches using the probability of the next word predicted by a translation model and the probability of the next READ/WRITE action in order to learn a policy for simultaneous translation with high translation quality and low latency (Section 6, lines 1-11, "We have presented a simple model that includes a delay token in the target vocabulary such that the model can apply both READ and WRITE actions during translation process without a explicit policy model. We also designed a restricted dynamic oracle for the simultaneous translation problem and provided a local training method utilizing this dynamic oracle. The model trained with this method can learn a flexible policy for simultaneous translation and achieve better translation quality and lower latency compared to previous methods.").
Gu, Ma, and Zheng are considered to be analogous to the claimed invention because they are in the same field of simultaneous language translation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gu in view of Ma to incorporate the teachings of Zheng to use the probability of the next word predicted by a translation model and the probability of the next READ/WRITE action.  Doing so would allow for learning a policy for simultaneous translation with high translation quality and low latency.
Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Gu in view of Zheng.
Regarding claim 16, Gu discloses the method as claimed in claim 14, but does not specifically disclose: wherein the step (c) includes simultaneously learning the probability distribution of output token generation and the probability of a READ action.
Zheng teaches:
wherein the step (c) includes simultaneously learning the probability distribution of output token generation (Section 2, lines 7-13, "Given a sequence x from the source language, the conventional machine translation model predicts the probability distribution of the next target word yj at the j-th step, conditioned on the full source sequence x and previously generated target words y<j, that is p(yj | x; y<j)."),
and the probability of a READ action (Section 3, lines 1-3, "To obtain a flexible and adaptive policy, we need our model to be able to take both READ and WRITE actions."; Section 4.1, lines 61-66, "If an action sequence a is obtained from our oracle, then applying this sequence will result in a prefix pair, say sa and ta, of x and y. Let p(a | sa; ta) be the probability of choosing action a given the prefix pair obtained by applying action sequence a.").
Zheng teaches using the probability of the next word predicted by a translation model and the probability of the next READ/WRITE action in order to learn a policy for simultaneous translation with high translation quality and low latency (Section 6, lines 1-11, "We have presented a simple model that includes a delay token in the target vocabulary such that the model can apply both READ and WRITE actions during translation process without a explicit policy model. We also designed a restricted dynamic oracle for the simultaneous translation problem and provided a local training method utilizing this dynamic oracle. The model trained with this method can learn a flexible policy for simultaneous translation and achieve better translation quality and lower latency compared to previous methods.").
Gu and Zheng are considered to be analogous to the claimed invention because they are in the same field of simultaneous language translation.  Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified Gu to incorporate the teachings of Zheng to use the probability of the next word predicted by a translation model and the probability of the next READ/WRITE action.  Doing so would allow for learning a policy for simultaneous translation with high translation quality and low latency.
Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Ma et al. (Ma, Mingbo, Liang Huang, Hao Xiong, Kaibo Liu, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, and Haifeng Wang, “STACL: Simultaneous Translation with Integrated Anticipation and Controllable Latency using Prefix-to-Prefix Framework”, July 2019, Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics (ACL), pp. 3025–3036.) teaches a method for simultaneous language translation.
Chousa et al. (Chousa, Katsuki, Katsuhito Sudoh and Satoshi Nakamura, “Simultaneous Neural Machine Translation using Connectionist Temporal Classification”, November 27, 2019, ArXiv abs/1911.11933.) teaches a method for simultaneous language translation.
Alinejad et al. (Alinejad, Ashkan, Maryam Siahbani, and Anoop Sarkar, “Prediction Improves Simultaneous Neural Machine Translation”, 2018, Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 3022-3027.) teaches a method for simultaneous language translation.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to James Boggs whose telephone number is (571)272-2968. The examiner can normally be reached M-F 8:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571)272-5551. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JAMES BOGGS/Examiner, Art Unit 2657                                                                                                                                                                                                        

/DANIEL C WASHBURN/Supervisory Patent Examiner, Art Unit 2657