DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
This action is in response to the amendments and arguments filed on December 10, 2021.
Claims 1, 3-7, 16, 18-23, and 25-29 are currently pending.
Claims 1, 3-7, 16, 18-23, and 25-29 have been amended.
Claims 2, 8-15, 17, and 24 have been cancelled.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 09/15/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Response to Amendment
The previous objections to the specification are withdrawn in view of Applicant’s amendment.
The previous objection to claims 3-6, 18-21, and 25-28 are withdrawn in view of Applicant’s amendment.
The previous rejection of claims 7, 22, and 29 under 35 U.S.C. 112(b) is withdrawn in view of Applicant’s amendment.

Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 16, and 23 have been considered but are moot because the new grounds of rejection necessitated by amendments to the claims do not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 16, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (CN 106126507 A, cited in the IDS submitted on 03/22/2021), in view of Sountsov (Sountsov et al., “Length bias in Encoder Decoder Models and a Case for Global Conditioning”,. arXiv, September, 2016).

With respect to claim 1, Zhang teaches A neural network machine translation
method, (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework.")
comprising: during encoding process, obtaining a to-be-translated source sentence; (para. [0011], "A. Word vector generation steps: the character-level input data is segmented through neural network modeling, and word vectors are generated"; para. [0017], "A1. Data preprocessing: establish a dictionary of the source language and the target language, perform One-Hot encoding on the characters of the source language and the target language, and represent a sentence into a matrix in chronological order." A to-be-translated sentence is received as character-level input data.)
converting the source sentence into a vector sequence through a Gated Recurrent Unit GRU model; (paras. [0011]-[0012], “A. Word vector generation steps: the character-level input data is segmented through neural network modeling, and word vectors are generated; B. Language model generation steps: use the recurrent neural network to have the characteristics of memory in time, so that the word vector can contain the language information of the context, and establish grammatical rules”; para. [0017], "A1. Data preprocessing: establish a dictionary of the source language and the target language, perform One-Hot encoding on the characters of the source language and the target language, and represent a sentence into a matrix in chronological order."; para. [0022], “B2. Adopt a variation of the well-known Long-Short Term Memory (LSTM) network: Gated Recurrent Unit (GRU) network to generate language models” The sentence is encoded using a Gated Recurrent Unit, resulting in a sequence of vectors.)
and determining a target sentence as a translation result according to the candidate objects. (para. [0061], "The second part, the decoder part, uses the source language and the target language to build a word alignment model, and calculates the candidate translation results, and selects the best results for output.")
But Zhang does not explicitly teach during decoding process, determining candidate objects corresponding to the vector sequence according to a prefix tree which is pre-obtained and built based on a target sentence database.
Sountsov, however, does teach and during decoding process, determining candidate objects corresponding to the vector sequence according to a prefix tree (Section 3.2, "During inference, given an input x we need to find argmaxy∈Ws(y|x, θ). This task can be performed efficiently in our network because the vectors vy for the sequences y in the whitelist W can be precomputed. Given an input x, we compute vx and take dot-product with the pre-computed vectors to find the highest scoring response. This gives us the optimal response."; Section 4.1.4, "For ED models we use a beam search with width ranging from 1 to 15 over a token prefix trie constructed from the sequences in W.")
which is pre-obtained and built based on a target sentence database, (Section 3.2, "During inference, given an input x we need to find argmaxy∈Ws(y|x, θ). This task can be performed efficiently in our network because the vectors vy for the sequences y in the whitelist W can be precomputed. Given an input x, we compute vx and take dot-product with the pre-computed vectors to find the highest scoring response. This gives us the optimal response."; Section 4.1.4, "For ED models we use a beam search with width ranging from 1 to 15 over a token prefix trie constructed from the sequences in W.")
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of Zhang with during decoding process, determining candidate objects corresponding to the vector sequence according to a prefix tree which is pre-obtained and built based on a target sentence database in order to model sequences probabilistically. (Sountsov, Abstract)

With respect to claim 16, it is substantially similar to claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Zhang teaches A device, (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." As is well-understood in the art, a computer comprises one or more processors, memory, and is capable of executing operations stored in memory via the one or more processors.)
wherein the device comprises: one or more processors; (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." One or more processors is inherent to the computer system of Zhang. As is well-understood in the art, a computer comprises one or more processors, memory, and is capable of executing operations stored in memory via the one or more processors.)
a memory; (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." Memory is inherent to the computer system of Zhang. As is well-understood in the art, a computer comprises one or more processors, memory, and is capable of executing operations stored in memory via the one or more processors.)
one or more programs stored in the memory and configured to execute the following operation when executed by the one or more processors: (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." One or more programs stored in memory and configured to execute operations is inherent to the computer system of Zhang. As is well-understood in the art, a computer comprises one or more processors, memory, and is capable of executing operations stored in memory via the one or more processors.)

With respect to claim 23, it is substantially similar to claim 1 and is rejected in the same manner, the same art and reasoning applying. Further, Zhang teaches A non-volatile computer storage medium in which one or more programs are stored, (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." Examiner asserts A non-volatile computer storage medium is inherent in the implementation of a method using the computer’s programming ability. Using a computer’s programming ability is well-understood in the art to involve execution of operation instructions in the form of a computer program and the computer program is stored on a non-volatile computer storage medium.)
an apparatus being enabled to execute the following operation when said one or more programs are executed by the apparatus: (Abstract, "The invention provides a method and a system for deep nerve translation based on character encoding. A combined nerve network model is established by using an RNN to cover the whole translation process, and translation tasks are directly completed from the perspective of an encoder-decoder framework."; para. [0005], "automatically converts one language into another language by using the computer's programming ability." Examiner asserts A non-volatile computer storage medium is inherent in the implementation of a method using the computer’s programming ability. Using a computer’s programming ability is well-understood in the art to involve execution of operation instructions in the form of a computer program and the computer program is stored on a non-volatile computer storage medium.)

Claims 3-5, 18-20, and 25-27 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (CN 106126507 A, cited in the IDS submitted on 03/22/2021), in view of Sountsov (Sountsov et al., “Length bias in Encoder Decoder Models and a Case for Global Conditioning”,. arXiv, September, 2016), further in view of Zens (Zens et al., “Efficient Phrase-table Representation for Machine Translation with Applications to Online MT and Speech Translation”,. Association for Computational Linguistics, April, 2007, pp. 492-499., cited in the IDS submitted on 03/22/2021), and further in view of Roskind (US 8412728 B1).

With respect to claim 3, modified Zhang teaches the neural network machine translation method of claim 1, and Zhang also teaches if determining there does not exist next to-be-translated word in the vector sequence, outputting top M recently-obtained (paras. [0013]-[0014], “C. Word alignment model generation steps: Using the attention mechanism, through the neural network model training, the probability of translating multiple words in the source language sentence into the target language word is obtained, and the source language is added as a weight to indicate the relationship between words Correspondence; D. Output step: Translate the input source language into the target language;” para. [0061], “the second part, the decoder part, uses the source language and the target language to build a word alignment model, and calculates the candidate translation results, and selects the best results for output.” Examiner asserts ‘determining there does not exist next to-be-translated word in the vector sequence’ is inherent, this determination must always occur at the end of a translation process, before any translation results are considered complete prior to an output step. In Zhang, a best result is selected for output after the translation procedure, satisfying the requirements of M, being a positive integer, being less than or equal to N.)
But modified Zhang does not explicitly teach determining candidate objects corresponding to the vector sequence according to the prefix tree, and determining the target sentence according to the candidate objects comprises: performing the following processing in turn for to-be-translated words in the vector sequence: respectively considering most recently-obtained candidate objects as prefixes, looking up the prefix tree for words located on next-level nodes of the prefixes, respectively putting the found words together with the corresponding prefixes to obtain preliminary objects and respectively determining conditional probability of the preliminary objects.
(Section 3, “A prefix tree, also called trie, is an ordered tree data structure used to store an associative array where the keys are symbol sequences. In the case of phrase-based MT, the keys are source phrases, i.e. sequences of source words and the associated values are the possible translations of these source phrases. In a prefix tree, all descendants of any node have a common prefix, namely the source phrase associated with that node. The root node is associated with the empty phrase.”; Section 4, "Figure 1: Illustration of the prefix tree. Left: list of source phrases and the corresponding prefix tree. Right: list of matching source phrases for sentence 'c a a c' (bold phrases match, phrases in italics are loaded in memory) and the corresponding partially loaded prefix tree (the dashed part is not in memory)." For a given word, the previous word in the sentence is used as a prefix. In the sentence ‘c a a c,’ the first ‘a’ has the first ‘c’ as a prefix, and it is also a prefix for the second ‘a’.)
looking up the prefix tree for words located on next-level nodes of the prefixes, (Section 3, “A prefix tree, also called trie, is an ordered tree data structure used to store an associative array where the keys are symbol sequences. In the case of phrase-based MT, the keys are source phrases, i.e. sequences of source words and the associated values are the possible translations of these source phrases. In a prefix tree, all descendants of any node have a common prefix, namely the source phrase associated with that node. The root node is associated with the empty phrase.”; Section 4, "Figure 1: Illustration of the prefix tree. Left: list of source phrases and the corresponding prefix tree. Right: list of matching source phrases for sentence 'c a a c' (bold phrases match, phrases in italics are loaded in memory) and the corresponding partially loaded prefix tree (the dashed part is not in memory)." For a given word, the previous word in the sentence is used as a prefix. With the sentence ‘c a a c’ as an example, the first ‘a’ has ‘c’ as a prefix, and it is itself a prefix for the second ‘a’.)
and respectively putting the found words together with the corresponding prefixes to obtain preliminary objects; (Section 4, "Figure 1: Illustration of the prefix tree. Left: list of source phrases and the corresponding prefix tree. Right: list of matching source phrases for sentence 'c a a c' (bold phrases match, phrases in italics are loaded in memory) and the corresponding partially loaded prefix tree (the dashed part is not in memory)."; "Section 4.1, "Let                         
                            
                                
                                    k
                                
                                
                                    0
                                
                            
                        
                     denote the root node of the prefix tree and let                         
                            
                                
                                    f
                                
                                
                                    k
                                
                            
                        
                     denote the prefix that leads to tree node                         
                            k
                        
                    . Furthermore, we define                         
                            E
                            (
                            k
                            )
                        
                     as the set of possible translations of the source phrase                         
                            
                                
                                    
                                        
                                            f
                                        
                                        ~
                                    
                                
                                
                                    k
                                
                            
                        
                    .”                         
                            E
                            
                                
                                    k
                                
                            
                        
                     represents all possible preliminary objects for a given source phrase. Preliminary objects are obtained by combining words found via the prefix tree with their corresponding prefixes.                         
                            E
                            
                                
                                    k
                                
                            
                        
                    , being the set of possible translations, represents the preliminary objects for a given source phrase.)
respectively determining conditional probability of the preliminary objects, (Section 4.1, "Note that we store only the target phrases                         
                            
                                
                                    e
                                
                                ~
                            
                        
                     in the set of possible translations                         
                            E
                            (
                            j
                            '
                            ,
                             
                            j
                            )
                        
                     and not the source phrases                         
                            
                                
                                    f
                                
                                ~
                            
                        
                    . This is based on the assumption that the models which are conditioned on the source phrase                         
                            
                                
                                    f
                                
                                ~
                            
                        
                     are independent of the context outside the phrase pair                         
                            (
                            
                                
                                    f
                                
                                ~
                            
                            ,
                             
                            
                                
                                    e
                                
                                ~
                            
                            )
                        
                    . This assumption holds for the standard phrase and word translation models. Thus, we have to keep only the target phrase with the highest probability. It might be violated by lexicalized distortion models (dependent on the configuration); in that case we have to store the source phrase along with the target phrase and the probability, which is again straightforward.” The probability of correct translation for each preliminary object (possible translation) is calculated.)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with looking up the prefix tree for words located on next-level nodes of the prefixes, respectively putting the found words together with the corresponding prefixes to obtain preliminary objects and respectively determining conditional probability of the preliminary objects in order to use significantly larger input word graphs in a more efficient way resulting in improved translation quality. (Zens, Abstract)
	But modified Zhang does not explicitly teach ranking the preliminary objects in a descending order of values of the conditional probability, and considering top N preliminary objects in the rank as candidate objects, N being a positive integer larger than one.
	Roskind, however, does teach ranking the preliminary objects in a descending order of values of the conditional probability, (col. 8, ln. 49-60, "At step 404, two or more query suggestions are identified based on the partial query. At step 406, a determination is made of a probability that each respective query suggestion is a query that the user intended to input. At step 408, the two or more query suggestions are ranked based on the probability of each respective query suggestion. At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant."; col. 3, ln. 65 to col. 4, ln. 1, “Still further relevancy indicators may include ascending/descending list rankings of query suggestions including percentages or numbers to show the relevance” Query suggestions may be considered preliminary objects in the context of Roskind. They are ranked by their probabilities, with the higher probabilities being considered relevant. Rankings may be in descending order.)
and considering top N preliminary objects in the rank as candidate objects, N being a positive integer larger than one; (col. 8, ln. 49-60, "At step 404, two or more query suggestions are identified based on the partial query. At step 406, a determination is made of a probability that each respective query suggestion is a query that the user intended to input. At step 408, the two or more query suggestions are ranked based on the probability of each respective query suggestion. At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant." The query suggestions of Roskind are comparable to preliminary objects in the translation process. Any number of results satisfying the ‘two or more’ specified by Roskind also necessarily satisfy the requirements of N, being a positive integer value larger than one.)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with ranking the preliminary objects in a descending order of values of the conditional probability, and considering top N preliminary objects in the rank as candidate objects, N being a positive integer larger than one, in order to indicate the top ranking results. (Roskind, Abstract)

With respect to claim 4, modified Zhang teaches the neural network machine translation method of claim 3, and Zens also teaches wherein the method further comprises: for the first to-be-translated word in the vector sequence, considering all words located on the first-level nodes after a tree root in the prefix tree as preliminary objects; (Section 3, “A prefix tree, also called trie, is an ordered tree data structure used to store an associative array where the keys are symbol sequences. In the case of phrase-based MT, the keys are source phrases, i.e. sequences of source words and the associated values are the possible translations of these source phrases. In a prefix tree, all descendants of any node have a common prefix, namely the source phrase associated with that node. The root node is associated with the empty phrase.”; Section 4, "Figure 1: Illustration of the prefix tree. Left: list of source phrases and the corresponding prefix tree. Right: list of matching source phrases for sentence 'c a a c' (bold phrases match, phrases in italics are loaded in memory) and the corresponding partially loaded prefix tree (the dashed part is not in memory)." The first-to-be-translated word begins with the prefix representing the empty phrase, which is associated with the tree root. As such, all nodes following the empty phrase are considered as preliminary objects (possible translations).)
respectively determining conditional probability of the preliminary objects, (Section 4.1, "Note that we store only the target phrases                         
                            
                                
                                    e
                                
                                ~
                            
                        
                     in the set of possible translations                         
                            E
                            (
                            j
                            '
                            ,
                             
                            j
                            )
                        
                     and not the source phrases                         
                            
                                
                                    f
                                
                                ~
                            
                        
                    . This is based on the assumption that the models which are conditioned on the source phrase                         
                            
                                
                                    f
                                
                                ~
                            
                        
                     are independent of the context outside the phrase pair                         
                            (
                            
                                
                                    f
                                
                                ~
                            
                            ,
                             
                            
                                
                                    e
                                
                                ~
                            
                            )
                        
                    . This assumption holds for the standard phrase and word translation models. Thus, we have to keep only the target phrase with the highest probability. It might be violated by lexicalized distortion models (dependent on the configuration); in that case we have to store the source phrase along with the target phrase and the probability, which is again straightforward.” The probability of correct translation for each preliminary object (possible translation) is calculated.)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of Zhang with the method further comprising, for the first to-be-translated word in the vector sequence, considering all words located on the first-level nodes after a tree root in the prefix tree as preliminary objects in order to use significantly larger input word graphs in a more efficient way resulting in improved translation quality. (Zens, Abstract)

Roskind, however, does teach ranking the preliminary objects in a descending order of values of the conditional probability, (col. 8, ln. 49-60, "At step 404, two or more query suggestions are identified based on the partial query. At step 406, a determination is made of a probability that each respective query suggestion is a query that the user intended to input. At step 408, the two or more query suggestions are ranked based on the probability of each respective query suggestion. At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant."; col. 3, ln. 65 to col. 4, ln. 1, “Still further relevancy indicators may include ascending/descending list rankings of query suggestions including percentages or numbers to show the relevance” Query suggestions may be considered preliminary objects in the context of Roskind. They are ranked by their probabilities, with the higher probabilities being considered relevant. Rankings may be in descending order.)
and considering top N preliminary objects in the rank as candidate objects. (col. 8, ln. 49-60, "At step 404, two or more query suggestions are identified based on the partial query. At step 406, a determination is made of a probability that each respective query suggestion is a query that the user intended to input. At step 408, the two or more query suggestions are ranked based on the probability of each respective query suggestion. At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant." The query suggestions of Roskind are comparable to preliminary objects in the translation process. Any number of results satisfying the ‘two or more’ specified by Roskind also necessarily satisfy the requirements of N, being a positive integer value larger than one.)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with ranking the preliminary objects in a descending order of values of the conditional probability and considering top N preliminary objects in the rank as candidate objects in order to indicate the top ranking results. (Roskind, Abstract)

With respect to claim 5, modified Zhang teaches the neural network machine translation method of claim 3, and Roskind also teaches wherein the method further comprises: for top N preliminary objects after the ranking, screening said N preliminary objects to obtain preliminary objects whose conditional probability is larger than a predetermined threshold, (col. 8, ln. 55-63, "At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant. If the query suggestions are below the threshold value, the determination may be made that the two or more query suggestions have not met the predetermined threshold.")
and considering the obtained preliminary objects as candidate objects. (col. 8, ln. 55-63, "At step 410, a determination is made that neither of the two or more query suggestions are associated with a probability above a threshold. The threshold may be a predetermined probability (e.g., 60%, 80%, etc.) that is required to be met or exceeded by the processing system in order for the query suggestions to be considered relevant. If the query suggestions are below the threshold value, the determination may be made that the two or more query suggestions have not met the predetermined threshold.")
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with ranking the preliminary objects in a descending order of values of the conditional probability and considering top N preliminary objects in the rank as candidate objects in order to indicate the top ranking results. (Roskind, Abstract)

With respect to claim 18, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 19, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying.



With respect to claim 25, it is substantially similar to claim 3 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 26, it is substantially similar to claim 4 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 27, it is substantially similar to claim 5 and is rejected in the same manner, the same art and reasoning applying.

Claims 6, 21, and 28 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (CN 106126507 A, cited in the IDS submitted on 03/22/2021), in view of Sountsov (Sountsov et al., “Length bias in Encoder Decoder Models and a Case for Global Conditioning”,. arXiv, September, 2016), further in view of Zens (Zens et al., “Efficient Phrase-table Representation for Machine Translation with Applications to Online MT and Speech Translation”,. Association for Computational Linguistics, April, 2007, pp. 492-499., cited in the IDS submitted on 03/22/2021), further in view of Roskind (US 8412728 B1), and further in view of Devlin (Devlin et al., "Fast and Robust Neural Network Joint Models for Statistical Machine Translation",. Association for Computational Linguistics, June, 2014, pp. 1370-1380.).

Devlin, however, does teach wherein the respectively determining conditional probability of the preliminary objects comprises: using a self-normalization algorithm to respectively determine the conditional probability of all preliminary objects. (Section 2.3, “To do this, we present the novel technique of self-normalization, where the output layer scores are close to being probabilities without explicitly performing a softmax. Formally, we define the standard softmax log likelihood as:

    PNG
    media_image1.png
    134
    304
    media_image1.png
    Greyscale

where x is the sample, U is the raw output layer scores, r is the output layer row corresponding to the observed target word, and Z(x) is the softmax normalizer. If we could guarantee that log(Z(x)) were always equal to 0 (i.e., Z(x) = 1) then at decode time we would only have to compute row r of the output layer instead of the whole matrix. While we cannot train a neural network with this guarantee, we can explicitly encourage the log-softmax normalizer to be as close to 0 as possible by augmenting our training objective function:

    PNG
    media_image2.png
    110
    398
    media_image2.png
    Greyscale

In this case, the output layer bias weights are initialized to log(1/|V|), so that the initial network is self-normalized. At decode time, we simply use Ur(x) as the feature score, rather than log(P(x)).”)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with wherein the respectively determining conditional probability of the preliminary objects comprises using a self-normalization algorithm to respectively determine the conditional probability of all preliminary objects in order to increase the lookup speed during decoding. (Devlin, Section 2.3)
 
With respect to claim 21, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 28, it is substantially similar to claim 6 and is rejected in the same manner, the same art and reasoning applying.

Claims 7, 22, and 29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Zhang (CN 106126507 A, cited in the IDS submitted on 03/22/2021), in view of Sountsov (Sountsov et al., “Length bias in Encoder Decoder Models and a Case for Global Conditioning”,. arXiv, September, 2016), further in view of Tommasel (Tommasel et al., “A distributed approach for accelerating sparse matrix arithmetic operations for high-dimensional feature selection”,. Springer-Verlag London, August, 2016, pp. 459-497.), and further in view of Karpusenko (Karpusenko et al., “Caffe* Optimized for Intel® Architectures: Applying Modern Code Techniques”,. Intel Corporation, August, 2016, pp. 1-9.).

With respect to claim 7, modified Zhang teaches the neural network machine translation method of claim 1, but modified Zhang does not explicitly teach wherein during execution of the method, when performing matrix operations, using a computing manner with vector decomposition and thread pool in parallel to perform matrix operations for a sparse matrix.
Tommasel, however, does teach wherein during execution of the method, when performing matrix operations, using a computing manner with vector decomposition and thread pool in parallel to perform matrix operations for a sparse matrix, (Abstract, “This work proposes a novel approach for distributing sparse matrix arithmetic operations on computer clusters aiming at speeding-up the processing of high-dimensional matrices.”; Section 2, “Experimental evaluation showed that the approach outperformed spectral and singular vector decomposition clustering approaches in terms of accuracy and precision.”; Section 4.2, “The baseline for comparing and evaluating the enhancements introduced by using the proposed PF to distribute tasks on the cluster was the execution of all the operations in a serial and a multi-thread manner. In particular, for the multi-thread execution pools of 3, 4 and 5 threads were considered” Tommasel proposes an approach for increasing the performance of conducting matrix operations for sparse matrices and compares the results of vector decomposition approaches implemented using multi-thread execution pools (thread pool in parallel).)
It would have been obvious to an artisan of ordinary skill before the effective filing date of the claimed invention to combine the neural network machine translation method of modified Zhang with wherein during execution of the method, when performing matrix operations, using a computing manner with vector decomposition and thread pool in parallel to perform matrix operations for a sparse matrix in order to speed up the processing of high-dimensional matrices. (Tommasel, Abstract)
But Tommasel does not explicitly teach performing multi-thread concurrent matrix operations for a non-sparse matrix.
Karpusenko, however, does teach performing multi-thread concurrent matrix operations for a non-sparse matrix. (Introduction, “In RNNs, the dense matrix (or matrices) is the same for every layer (the layer is recurrent), and the length of the network is determined by the length of the input signal.”; The gemm_omp_driver_v2 function—part of libmkl_intel_thread.so00000000—is a general matrix-matrix (GEMM) multiplication implementation of Intel MKL. This function uses OpenMP multithreading behind the scenes. Optimized Intel MKL matrix-matrix multiplication is the main function used for forward and backward propagation—that is, for weight calculation, prediction, and adjustment. Intel MKL initializes OpenMP multithreading, which usually reduces the computation time of GEMM operations.” Karpusenko explores using Intel Math Kernel Library (Intel MKL) and multi-threading to optimize deep neural networks, including optimizing dense (non-sparse) matrix operations in recurrent neural networks.)
(Karpusenko, Abstract)

With respect to claim 22, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying.

With respect to claim 29, it is substantially similar to claim 7 and is rejected in the same manner, the same art and reasoning applying.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARK J TURNER whose telephone number is (571)272-8469. The examiner can normally be reached Monday-Thursday 9am-7pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on (571)272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/M.J.T./Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121