EXAMINER’S AMENDMENT AND REASONS FOR ALLOWANCE
Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Attorney John Treilhard, Reg. No. L1264 on August 5, 2021.
The claims being amended herein are presented below in two sets: 1) Marked-Up form with all Examiner’s Amendments indicated, and 2) Final form with all Examiner’s amendments having been entered. Only the claims presented below are being amended in this Examiner’s Amendment, with all other claims being in final form and entered as presented in the After-Final Amendment filed July 23, 2021.

Marked-Up Form Examiner’s Amended Claims  
Claim 1.  A computer-implemented method comprising:	obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,		wherein the respective score distribution for each of the output positions in the predicted output sequence for the network input comprises a respective score for n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position represents a likelihood that the token is a token at the output position in [[an]] the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for each training input, a respective target output sequence comprising one or more words;	for each training input:		processing the training input using the neural network in accordance with current values of the parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the training input; 		sampling, from a plurality of possible valid decompositions of the target output sequence for the training input, a valid decomposition of the target output sequence that includes n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined set of tokens, wherein the sampling comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input.

Claim 3.  The method of claim 1, wherein for each training input, adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input comprises:	performing an iteration of a neural network training procedure to increase a logarithm of a product of the respective scores for each token in the sampled valid decomposition in the respective score distribution for the output position, in the predicted output sequence for the training input, that corresponds corresponding to the position of the token in the sampled valid decomposition. 

Claim 4.  The method of claim 1, wherein for each training input, the sampling further comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input: 	sampling, from valid tokens in the predetermined set of tokens, a valid token randomly. 
 	
Claim 5.  The method of claim 1, wherein for each training input, the method further comprises, in the predicted output sequence for the training input, and in order starting from an initial position:	providing a sampled valid token for the output position as input to the neural network for use in generating the respective score distribution for a next output position of the plurality of output positions in the predicted output sequence for the training input.

Claim 13.  A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: 		obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,in the predicted output sequence for the network input comprises a respective score for each token in a predetermined set of tokens,		wherein the predetermined set of tokens includes n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position in the predicted output sequence for the network input represents a likelihood that the token is a token at the output position in [[an]] the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for each training input, a respective target output sequence comprising one or more words;	for each training input:		processing the training input using the neural network in accordance with current values of the parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the training input; 		sampling, from a plurality of possible valid decompositions of the target output sequence for the training input, a valid decomposition of the target output sequence that includes n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a valid token in accordance with the scores for the valid tokens in the respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input.

Claim 14.  One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: 		obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,in the predicted output sequence for the network input comprises a respective score for each token in a predetermined set of tokens,		wherein the predetermined set of tokens includes n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position in the predicted output sequence for the network input represents a likelihood that the token is a token at the output position in [[an]] the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for each training input, a respective target output sequence comprising one or more words;	for each training input:		processing the training input using the neural network in accordance with current values of the parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the training input; 		sampling, from a plurality of possible valid decompositions of the target output sequence for the training input, a valid decomposition of the target output sequence that includes n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a valid token in accordance with the scores for the valid tokens in the respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input.

Claim 16.  The non-transitory computer storage media of claim 14, wherein for each training input, adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input comprises:	performing an iteration of a neural network training procedure to increase a logarithm of a product of the respective scores for each token in the sampled valid decomposition in the respective score distribution for the output position, in the predicted output sequence for the training input, that corresponds 
 
Claim 17.  The non-transitory computer storage media of claim 14, wherein for each training input, the sampling further comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input: 	sampling, from valid tokens in the predetermined set of tokens, a valid token randomly. 

Claim 18.  The non-transitory computer storage media of claim 14, wherein for each training input, the operations further comprise, in the predicted output sequence for the training input and in order starting from an initial position:	providing a sampled valid token for the output position as input to the neural network for use in generating the respective score distribution for a next output position of the plurality of output positions in the predicted output sequence for the training input.

Claim 23.  The method of claim 4, wherein for each training input, for each of the plurality of output positions in the predicted output sequence for the training input and in order starting from an initial position: 	a valid token for the output position is sampled from valid tokens in the                 
                    ϵ
                
            , and	a valid token for the output position is sampled from valid tokens in the predetermined set of tokens in accordance with the scores for the valid tokens in the respective score distribution for the output position for the training input with probability                 
                    1
                    -
                    ϵ
                
            .

Final Form Examiner’s Amended Claims
Claim 1.  A computer-implemented method comprising:	obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,		wherein the respective score distribution for each of the output positions in the predicted output sequence for the network input comprises a respective score for each token in a predetermined set of tokens,		wherein the predetermined set of tokens includes n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position represents a likelihood that the token is a token at the output position in the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined set of tokens, wherein the sampling comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a valid token in accordance with the scores for the valid tokens in the respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to 

Claim 3.  The method of claim 1, wherein for each training input, adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input comprises:	performing an iteration of a neural network training procedure to increase a logarithm of a product of the respective scores for each token in the sampled valid decomposition in the respective score distribution for the output position, in the predicted output sequence for the training input, that corresponds corresponding to the position of the token in the sampled valid decomposition. 

Claim 4.  The method of claim 1, wherein for each training input, the sampling further comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input: 	sampling, from valid tokens in the predetermined set of tokens, a valid token randomly.
 	
Claim 5.  The method of claim 1, wherein for each training input, the method further comprises, for each of the plurality of output positions in the predicted output sequence for the training input, and in order starting from an initial position:

Claim 13.  A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers to perform operations comprising: 		obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,		wherein the respective score distribution for each of the output positions in the predicted output sequence for the network input comprises a respective score for each token in a predetermined set of tokens,		wherein the predetermined set of tokens includes n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position in the predicted output sequence for the network input represents a likelihood that the token is a token at the output position in the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined set of tokens, wherein the sampling comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a valid token in accordance with the scores for the valid tokens in the respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to 

Claim 14.  One or more non-transitory computer storage media storing instructions that when executed by one or more computers cause the one or more computers to perform operations comprising: 		obtaining training data for training a neural network, 		wherein the neural network is configured to receive a network input and to process the network input in accordance with a plurality of parameters of the neural network to generate a respective score distribution for each of a plurality of output positions in a predicted output sequence for the network input,		wherein the respective score distribution for each of the output positions in the predicted output sequence for the network input comprises a respective score for each token in a predetermined set of tokens,		wherein the predetermined set of tokens includes n-grams of multiple different sizes, 		wherein, for each output position in the predicted output sequence for the network input, the respective score for each of the tokens in the respective score distribution for the output position in the predicted output sequence for the network input represents a likelihood that the token is a token at the output position in the predicted output sequence for the network input, and		wherein the training data comprises a plurality of training inputs, and for n-grams of different sizes at respective output positions of the plurality of output positions in the predicted output sequence for the training input, wherein each possible valid decomposition of the target output sequence decomposes the target output sequence into a different sequence of tokens from the predetermined set of tokens, wherein the sampling comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input:			sampling, from valid tokens in the predetermined set of tokens, a valid token in accordance with the scores for the valid tokens in the respective score distribution for the output position in the predicted output sequence for the training input,			wherein a valid token for the output position in the predicted output sequence for the training input is a token from the predetermined set of tokens that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position in the predicted output sequence for the training input; and		adjusting the current values of the parameters of the neural network to 

Claim 16.  The non-transitory computer storage media of claim 14, wherein for each training input, adjusting the current values of the parameters of the neural network to increase likelihoods of the tokens in the sampled valid decomposition being the tokens at the corresponding output positions in the predicted output sequence for the training input comprises:	performing an iteration of a neural network training procedure to increase a logarithm of a product of the respective scores for each token in the sampled valid decomposition in the respective score distribution for the output position, in the predicted output sequence for the training input, that corresponds to the position of the token in the sampled valid decomposition.
 
Claim 17.  The non-transitory computer storage media of claim 14, wherein for each training input, the sampling further comprises, for each of one or more of the plurality of output positions in the predicted output sequence for the training input: 	sampling, from valid tokens in the predetermined set of tokens, a valid token randomly.	

Claim 18.  The non-transitory computer storage media of claim 14, wherein for each training input, the operations further comprise, for each of the plurality of output 

Claim 23.  The method of claim 4, wherein for each training input, for each of the plurality of output positions in the predicted output sequence for the training input and in order starting from an initial position: 	a valid token for the output position is sampled from valid tokens in the predetermined set of tokens randomly with probability                 
                    ϵ
                
            , and	a valid token for the output position is sampled from valid tokens in the predetermined set of tokens in accordance with the scores for the valid tokens in the respective score distribution for the output position for the training input with probability                 
                    1
                    -
                    ϵ
                
            .


Reasons for Allowance
The following is an examiner’s statement of reasons for allowance.
Regarding claims 1, 13 and 14, and claims depending therefrom, the cited art of record does not, in any combination obvious to one having ordinary skill in the art before the effective filing date of the claimed invention, teach or suggest “sampling, from a plurality of possible valid decompositions of the target output sequence for the training input, a valid decomposition of the target output sequence that includes n-grams of  input," and all supporting limitations thereof.
The closest cited art of record includes Zhang NPL and Bahdanau NPL, both references which disclose choosing (sampling) a subset of vectors (of encodings for an input sample) from which the neural network is trained, however, neither Zhang NPL nor Bahdanau NPL teach or suggest to one of ordinary skill in the art before the effective filing date of the claimed invention that the choosing (sampling) is a valid token for the output position that would be a valid addition to a current partial valid decomposition of the target output sequence as of the output position.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHELLE M KOETH whose telephone number is (571)272-5908.  The examiner can normally be reached on Monday-Friday, 9:30a-6:30
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Bhavesh Mehta can be reached on 571-272-7453.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


MICHELLE M. KOETH
Primary Examiner
Art Unit 2656


/MICHELLE M KOETH/Primary Examiner, Art Unit 2656