DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This action is in reply to the interview on 4/20/2021 and proposed Examiner’s Amendments emailed to Examiner on 4/20/2021 for application 15/852,512 filed 12/22/2017.
Priority
Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged due to the Application Data Sheet received on 4/20/2021.  
Examiner’s Amendment
An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
Authorization for this examiner’s amendment was given in a telephone interview with Primary Examiner Eric Nilsson and Attorney Micah Drayton on 04/20/2021.

The application has been amended as follows:

 (Currently amended).  A computer method for training a sequence learning model based on reinforcement learning and neural networks, the method comprising:
retrieving input sequence data, the input sequence data including one or more input time sequences;
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence 
decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data;
comparing the decoded sequence data with the input sequence data, 
wherein comparing the decoded sequence data with the input sequence data comprises computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match; and
 training the first neural network to update the sequence learning model based on the comparison, wherein training further comprises:
determining a length of the output symbol data;
estimating an expected end reward, wherein the expected end reward is based on:
a distance between the decoded time sequence and the input time sequence; and
an additive inverse of a length of the output symbol sequence; and
estimating the expected end reward further comprises:
computing a first term by multiplying the additive inverse of the length by a coefficient;
computing a second term by multiplying the distance by one minus the coefficient; and
adding the first term to the second term; and
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
(Original).  The method of claim 1, wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions.
(Original).  The method of claim 1 further comprising:
determining whether to encode the input sequence data into a symbolic representation or into an empty representation.
(Canceled).
(Canceled).
(Currently amended).  The method of claim [[5]]1 further comprising:
storing a tuple of the input sequence data, the output symbol data and the expected end reward.
(Previously presented).  The method of claim 6, wherein the steps are implemented recursively until the tuple converges below a threshold.
(Currently amended).  A non-transitory computer-readable storage medium storing executable computer program instructions for training a sequence learning model based on reinforcement learning and neural networks, the computer program instructions comprising instructions for:
retrieving input sequence data, the input sequence data including one or more input time sequences;
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence learning model, the output symbol data including one or more symbolic representations;
decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data;
comparing the decoded sequence data with the input sequence data, 
wherein comparing the decoded sequence data with the input sequence data comprises computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match; and
training the first neural network to update the sequence learning model based on the comparison, wherein training further comprises:
determining a length of the output symbol data;
estimating an expected end reward, wherein the expected end reward is based on:
a distance between the decoded time sequence and the input time sequence; and
an additive inverse of a length of the output symbol sequence; and
estimating the expected end reward further comprises:
computing a first term by multiplying the additive inverse of the length by a coefficient;
computing a second term by multiplying the distance by one minus the coefficient; and
adding the first term to the second term; and
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
(Original).  The computer-readable storage medium of claim 8, wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions.
(Original).  The computer-readable storage medium of claim 8, wherein the computer program instructions further comprise instructions for:
determining whether to encode the input sequence data into a symbolic representation or into an empty representation.
(Canceled).
(Canceled).
(Original).  The computer-readable storage medium of claim [[12]]8, wherein the computer program instructions further comprise instructions for:
storing a tuple of the input sequence data, the output symbol data and the expected end reward.
(Previously presented).  The computer-readable storage medium of claim 13, wherein computer instructions executed sequentially from claim 13 are implemented recursively until the tuple converges below a threshold.
(Currently amended).  A computer system for training a sequence learning model based on reinforcement learning and neural networks, the system comprising:
a processor; and
memory storing an encoder for:
retrieving input sequence data, the input sequence data including one or more input time sequences; and
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence learning model, the output symbol data including one or more symbolic representations;
a decoder for:
decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data; and
comparing the decoded sequence data with the input sequence data, 
wherein comparing the decoded sequence data with the input sequence data comprises computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match; and
the encoder is also for:
training the first neural network to update the sequence learning model based on the comparison, wherein training further comprises:
determining a length of the output symbol data;
estimating an expected end reward, wherein the expected end reward is based on:
a distance between the decoded time sequence and the input time sequence; and
an additive inverse of a length of the output symbol sequence; and
estimating the expected end reward further comprises:
computing a first term by multiplying the additive inverse of the length by a coefficient;
computing a second term by multiplying the distance by one minus the coefficient; and
adding the first term to the second term; and
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
(Original).  The system of claim 15, wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions.
(Previously presented).  The system of claim 15, wherein the encoder is further for:
determining whether to encode the input sequence data into a symbolic representation or into an empty representation.
(Canceled).
(Canceled).
(Currently amended).  The system of claim [[19]]15, wherein the encoder is further for:
storing a tuple of the input sequence data, the output symbol data and the expected end reward.

REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance: claims 1-3, 6-10, 13-17 and 20 are considered allowable since when reading the claims in light of the specification, none of the references of record either alone or in combination fairly disclose or suggest the combination of limitations specified in the independent claims, including at least:
For independent claims 1, 8, and 15: 
wherein comparing the decoded sequence data with the input sequence data comprises computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match;
…
and
estimating the expected end reward further comprises:
computing a first term by multiplying the additive inverse of the length by a coefficient;
computing a second term by multiplying the distance by one minus the coefficient; and
adding the first term to the second term;

The closest prior art of record is Cho et al. (“Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”). Cho teaches retrieving an input time sequence (page 2, first paragraph), encoding it into output symbol data using a first recurrent neural network (an encoder) (page 2, column 2), decoding the output symbol data into decoded time sequence data using a second recurrent neural network (a decoder) (page 2 column 1), updating a parameter theta for the encoder-decoder (page 2, col. 2), and determining a length of the output symbol data. 
Van Hoof et al. (“Stable Reinforcement Learning with Autoencoders for Tactile and Visual Data”), hereinafter “Hoof”, teaches comparing by finding a reconstruction error computed by an error loss function (p. 3929, col. 2, ¶ 2). Hoof teaches estimating an expected end reward. The reward function must be based on the reconstruction error from Hoof’s Equation 1. The latent representation z in Fig. 2 is based on the backpropagation of the reconstruction error. Hoof teaches adjusting one or more parameters of the sequence learning model to maximize the expected end reward. As stated by Hoof on page 3929, col. 3, second paragraph, “The parameters are updated by gradient descent on the reconstruction error”. Hoof teaches maximizing the end reward on p. 3930, col. 2, under section B, “The goal of a reinforcement learning agent is to choose a policy… [that] maximizes the average reward.” 

Other references such as US Patent Publication 2019/0117087 to Yasunaga, in ¶ [0174], teaches regularization/penalization, where additive inverse (subtraction) is inherent to regularization/penalization.

However, none of the prior art of record teaches:
estimating the expected end reward further comprises:
computing a first term by multiplying the additive inverse of the length by a coefficient;
computing a second term by multiplying the distance by one minus the coefficient; and
adding the first term to the second term;  

Dependent claims 2-3, 6-7, 9-10, 13-14, 16-17, and 20 are allowed as they depend upon an allowable independent claim.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”
Conclusion
Claims 1-3, 6-10, 13-17, and 20 are allowed.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/ASHER JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122