Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Amendments
This action is in response to amendments filed 03/16/2021. Claims 1, 4, 5, 7, 8, 12, 14, 15, and 17-20 are amended. Claims 1-20 are pending and have been considered.
Priority
Upon further review, Examiner has determined that application does not fulfill priority requirement. Applicant’s claim for the benefit of a prior-filed application under 35 U.S.C. 119(e) or under 35 U.S.C. 120, 121, 365(c), or 386(c) is acknowledged. Applicant has not complied with one or more conditions for receiving the benefit of an earlier filing date under 35 U.S.C. 119(e) as follows: The continuity type field is blank on the Application Data Sheet filed 12/22/2017. Appropriate correction is required if Applicant wishes to claim priority.

    PNG
    media_image1.png
    298
    796
    media_image1.png
    Greyscale

Drawings
The drawings were received on March 16, 2021.  These drawings are acceptable.
Claim Objections
Claims 1 is objected to because of the following informalities:  Line 5 should read “input time” instead of “inputtime”.  Appropriate correction is required.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that 
Claims 1, 4, 6-8, 11, 13-15, 18 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Cho et al. (“Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation”, 2014), hereinafter Cho, in view of van Hoof et al. (“Stable Reinforcement Learning with Autoencoders for Tactile and Visual Data”, 2016), hereinafter “Hoof,” and further in view of US Patent Publication 2012/0202538 to Uusitalo et al., hereinafter “Uusitalo,” and US Patent Publication 2019/0117087 to Yasunaga, hereinafter “Yasunaga.”

Regarding claim 1, Cho teaches: A computer method for training a sequence learning model based on reinforcement learning and neural networks, the method comprising: 
retrieving input sequence data, the input sequence data including one or more input time sequences; (Examiner is interpreting “input sequence data” as any type of sequential data input into a neural network. The “input time sequences” may also be an input time series. Cho teaches on page 2, first paragraph (all of Cho’s page numbers refer to the PDF page number):

    PNG
    media_image2.png
    241
    392
    media_image2.png
    Greyscale

The input time sequence is the sequence                         
                            x
                            =
                            (
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    x
                                
                                
                                    T
                                
                            
                            )
                        
                    . )
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence learning model, the output symbol data including one or more symbolic representations; (Examiner is interpreting a “sequence learning model” as a machine learning model. Cho teaches on page 2, column 2: “The encoder is an RNN that reads each symbol of an input sequence x sequentially. As it reads each symbol, the hidden state of the RNN changes according to Cho Eq. (1)”. The output symbol sequence is interpreted as c.)
decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data; (Cho teaches on page 2 column 1, under section 2.2 RNN Encoder-Decoder: “we propose a novel neural network architecture that learns to encode a variable-length sequence into a fixed-length vector representation and to decode a given fixed-length vector representation back into a variable-length sequence.” The decoded time sequence matches the input time sequence. Further, Cho teaches on page 2, column 2, second paragraph: “The decoder of the proposed model is another RNN which is trained to generate the output sequence by predicting the next symbol y_t given the hidden state h_< t >”. According to Figure 1, the encoder encodes the input states x_1 through x_T, and the decoder outputs predicted states y_T’ through y_1.

    PNG
    media_image3.png
    317
    325
    media_image3.png
    Greyscale

training the first neural network to update the sequence learning model based on [a comparison], (Examiner is interpreting updating the model as changing any of the model’s parameters. Cho teaches that the model parameter theta gets updated during training – see Cho page 2, col. 2 to page 3, col. 1:

    PNG
    media_image4.png
    124
    330
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    107
    307
    media_image5.png
    Greyscale

wherein training further comprises: determining a length of the output symbol data (subscript “n” in                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     is a length of the output symbol data)

comparing the decoded sequence data with the input sequence data; and 
[training is based on] the comparison 
wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and 
wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
But Hoof teaches: comparing the decoded sequence data with the input sequence data; (Examiner is interpreting “comparing” as finding a reconstruction error computed by error loss function. Hoof teaches this limitation on page 3929, col. 2, second paragraph, copied below. The input robot state x is analogous to the input time sequence of the instant claim, and the reconstructed input robot state x’ is analogous to the decoded time sequence of the instant claim.

    PNG
    media_image6.png
    498
    608
    media_image6.png
    Greyscale


Hoof also teaches: the comparison (As stated by Hoof on page 3929, col. 3, second paragraph and in the excerpt copied above, “The parameters are updated by gradient descent on the reconstruction error”. The reconstruction error can be used to update the model parameters.)
Further, Hoof teaches: wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and (Examiner is interpreting “estimating” as computing a probabilistic calculation. Examiner is interpreting “end reward” as any reward in a reinforcement learning system, such as an average reward, because the BRI of “end reward” is broad enough to allow for this interpretation.  In Hoof, the reward function must be based on the reconstruction error from z is based on the backpropagation of the reconstruction error. See Hoof Figure 2, copied below, which shows that the error from the encoder propagates to z).

    PNG
    media_image7.png
    323
    599
    media_image7.png
    Greyscale

wherein training further comprises: adjusting one or more parameters of the sequence learning model to maximize the expected end reward. (As stated by Hoof on page 3929, col. 3, second paragraph, “The parameters are updated by gradient descent on the reconstruction error”. Also, Hoof teaches maximizing the end reward on p. 3930, col. 2, under section B.

    PNG
    media_image8.png
    342
    593
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    75
    569
    media_image9.png
    Greyscale

    PNG
    media_image10.png
    62
    593
    media_image10.png
    Greyscale

(^ from p. 3931 col. 1)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Hoof’s system into Cho’s system by finding the error and by modifying the updating of Cho with the specific comparison as taught by Hoof, with a motivation of obtaining stable policy updates (Hoof discloses on page 3929, top of col. 2: “Our approach consists of two steps: autoencoders are used to learn a suitable representation and non-parametric relative entropy policy search is used to obtain stable policy updates.”)

wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
	But Uusitalo teaches: wherein the expected end reward is based on: …a length… (Uusitalo para. [0060] teaches penalizing/minimizing a length in a reward, however it is physical lengths and not lengths of a vector. (Uusitalo para [0060]: “The exemplary embodiments of this invention also enable pathfinding while optimizing multiple objectives. The goals for a pathfinding task can be set mathematically. For example, one goal can be to minimize a length of time taken by a user of the device 10 to travel from point A to point B, and to minimize the interference experienced by the user device 10 at each step of the travel. These goals can be formed into rewards at each time step. For example, interference level is a penalty (negative reward), and distance from point B is also a penalty. It should be noted that the total reward or penalty for any action involves the `goodness` of the action towards multiple goals (reaching the target, interference avoidance, and potential other goals of the user”).
	Uusitalo is in the same field of endeavor as the claimed invention, namely, devices for penalizing/minimizing parameters.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Uusitalo’s system into the combination of Cho and Hoof’s system by penalizing a length, with a motivation to optimize an objective (Uusitalo para. [0060).
Lastly, Yasunaga teaches: wherein the expected end reward is based on: … an additive inverse… of the output symbol sequence. (Yasunaga Para. [0174]: “Regularization is introduced to penalize the complexity of the model, and may provide a penalty to the norm (vector length) of the parameter.” The additive inverse (subtraction) is inherent to regularization/penalization.)
Yasunaga is in the same field of endeavor as the claimed invention, namely, neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Yasunaga’s system into the combination 

Regarding claim 4, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The method of claim 1, 
Further, Hoof teaches: wherein comparing the decoded sequence data with the input sequence data comprises: computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match. (Examiner is interpreting “a distance” between a decoded time sequence and an input time sequence as an error which can be computed by an error loss function. Hoof teaches this limitation on page 3929, col. 3, second paragraph, as discussed previously in the comparing limitation of claim 1.) 
	Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof’s system into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by decoding data using an error loss function with a motivation to map the latent representation back through the decoder to reconstruct the input (Hoof page 3929, col. 3, second paragraph).

Regarding claim 6, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The method of claim 5
Further, Hoof teaches: further comprising: storing a tuple of the input sequence data, the output symbol data and the expected end reward. (Hoof teaches on p. 3930, col. 2, section B, that the 
Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by passing the input data, output data, and reward to the next iteration, with a motivation to build a reinforcement learning model (p. 3930, col. 2, under section B. – Notation)

Regarding claim 7, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The method of claim 6, 
Further, Hoof teaches: wherein the steps are implemented recursively until the tuple converges below a threshold. (Hoff teaches on p. 3931, col. 1, second paragraph from the bottom: “The variance of policies typically shrinks after each iteration, such that the policy converges to a (locally) optimal policy.”)
Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by converging the parameters, with a motivation to build a reinforcement learning model (p. 3930, col. 2, under section B. – Notation) Although Cho does not explicitly teach convergence, it is implied through the “target sequence” language (Hoff teaches on page 1, col. 2, first paragraph: “the decoder maps the vector representation back to a variable-length target sequence.”) 

Regarding claim 8, Cho teaches: A non-transitory computer-readable storage medium storing executable computer program instructions (Cho discloses an experimental section on p. 5, col. 1, which indicates the presence of a storage medium soring computer program instructions) for training a sequence learning model based on reinforcement learning and neural network, the computer program instructions comprising instructions for: 
retrieving input sequence data, the input sequence data including one or more input time sequences; (Examiner is interpreting “input sequence data” as any type of sequential data input into a neural network. The “input time sequences” may also be an input time series. Cho teaches on page 2, first paragraph (all of Cho’s page numbers refer to the PDF page number):

    PNG
    media_image2.png
    241
    392
    media_image2.png
    Greyscale

The input time sequence is the sequence                         
                            x
                            =
                            (
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    x
                                
                                
                                    T
                                
                            
                            )
                        
                    . )
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence learning model, the output symbol data including one or more symbolic representations; (Examiner is interpreting a “sequence learning model” as a machine learning model. Cho teaches on page 2, column 2: “The encoder is an RNN that reads each symbol of an input sequence x sequentially. As it reads each symbol, the hidden state of the RNN changes according to Cho Eq. (1)”. The output symbol sequence is interpreted as c.)
decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data; (Cho teaches on page 2 column 1, under section 2.2 RNN Encoder-Decoder: “we propose a novel neural network architecture that learns to decode a given fixed-length vector representation back into a variable-length sequence.” The decoded time sequence matches the input time sequence. Further, Cho teaches on page 2, column 2, second paragraph: “The decoder of the proposed model is another RNN which is trained to generate the output sequence by predicting the next symbol y_t given the hidden state h_< t >”. According to Figure 1, the encoder encodes the input states x_1 through x_T, and the decoder outputs predicted states y_T’ through y_1.

    PNG
    media_image3.png
    317
    325
    media_image3.png
    Greyscale

training the first neural network to update the sequence learning model based on [a comparison], (Examiner is interpreting updating the model as changing any of the model’s parameters. Cho teaches that the model parameter theta gets updated during training – see Cho page 2, col. 2 to page 3, col. 1:

    PNG
    media_image4.png
    124
    330
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    107
    307
    media_image5.png
    Greyscale

wherein training further comprises: determining a length of the output symbol data (subscript “n” in                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     is a length of the output symbol data)

However, Cho does not explicitly teach: comparing the decoded sequence data with the input sequence data; and 
[training is based on] the comparison 
wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and 
wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
But Hoof teaches: comparing the decoded sequence data with the input sequence data; (Examiner is interpreting “comparing” as finding a reconstruction error computed by error loss function. Hoof teaches this limitation on page 3929, col. 2, second paragraph, copied below. The input robot state x is analogous to the input time sequence of the instant claim, and the reconstructed input robot state x’ is analogous to the decoded time sequence of the instant claim.

    PNG
    media_image6.png
    498
    608
    media_image6.png
    Greyscale


Hoof also teaches: the comparison (As stated by Hoof on page 3929, col. 3, second paragraph and in the excerpt copied above, “The parameters are updated by gradient descent on the reconstruction error”. The reconstruction error can be used to update the model parameters.)
Further, Hoof teaches: wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and (Examiner is interpreting “estimating” as computing a probabilistic calculation. Examiner is interpreting “end reward” as any reward in a reinforcement learning system, such as an average reward, because the BRI of “end reward” is broad enough to allow for this interpretation.  In Hoof, the reward function must be based on the reconstruction error from z is based on the backpropagation of the reconstruction error. See Hoof Figure 2, copied below, which shows that the error from the encoder propagates to z).

    PNG
    media_image7.png
    323
    599
    media_image7.png
    Greyscale

wherein training further comprises: adjusting one or more parameters of the sequence learning model to maximize the expected end reward. (As stated by Hoof on page 3929, col. 3, second paragraph, “The parameters are updated by gradient descent on the reconstruction error”. Also, Hoof teaches maximizing the end reward on p. 3930, col. 2, under section B.

    PNG
    media_image8.png
    342
    593
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    75
    569
    media_image9.png
    Greyscale

    PNG
    media_image10.png
    62
    593
    media_image10.png
    Greyscale

(^ from p. 3931 col. 1)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Hoof’s system into Cho’s system by finding the error and by modifying the updating of Cho with the specific comparison as taught by Hoof, with a motivation of obtaining stable policy updates (Hoof discloses on page 3929, top of col. 2: “Our approach consists of two steps: autoencoders are used to learn a suitable representation and non-parametric relative entropy policy search is used to obtain stable policy updates.”)

wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
	But Uusitalo teaches: wherein the expected end reward is based on: …a length… (Uusitalo para. [0060] teaches penalizing/minimizing a length in a reward, however it is physical lengths and not lengths of a vector. (Uusitalo para [0060]: “The exemplary embodiments of this invention also enable pathfinding while optimizing multiple objectives. The goals for a pathfinding task can be set mathematically. For example, one goal can be to minimize a length of time taken by a user of the device 10 to travel from point A to point B, and to minimize the interference experienced by the user device 10 at each step of the travel. These goals can be formed into rewards at each time step. For example, interference level is a penalty (negative reward), and distance from point B is also a penalty. It should be noted that the total reward or penalty for any action involves the `goodness` of the action towards multiple goals (reaching the target, interference avoidance, and potential other goals of the user”).
	Uusitalo is in the same field of endeavor as the claimed invention, namely, devices for penalizing/minimizing parameters.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Uusitalo’s system into the combination of Cho and Hoof’s system by penalizing a length, with a motivation to optimize an objective (Uusitalo para. [0060).
Lastly, Yasunaga teaches: wherein the expected end reward is based on: … an additive inverse… of the output symbol sequence. (Yasunaga Para. [0174]: “Regularization is introduced to penalize the complexity of the model, and may provide a penalty to the norm (vector length) of the parameter.” The additive inverse (subtraction) is inherent to regularization/penalization.)
Yasunaga is in the same field of endeavor as the claimed invention, namely, neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Yasunaga’s system into the combination 

Regarding claim 11, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The computer-readable storage medium of claim 8, 
Further, Hoof teaches: wherein comparing the decoded sequence data with the input sequence data comprises: computing a distance between one of the decoded time sequences and the one of the input time sequences that the one decoded time sequence is to match. (Examiner is interpreting “a distance” between a decoded time sequence and an input time sequence as an error which can be computed by an error loss function. Hoof teaches this limitation on page 3929, col. 3, second paragraph, as discussed previously in the comparing limitation of claim 1.) 
	Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by decoding data using an error loss function with a motivation to map the latent representation back through the decoder to reconstruct the input (Hoof page 3929, col. 3, second paragraph).

Regarding claim 13, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The computer-readable storage medium of claim 12, 
Further, Hoof teaches: wherein the computer program instructions further comprise instructions for: storing a tuple of the input sequence data, the output symbol data and the expected end reward. (Hoof teaches on p. 3930, col. 2, section B, that the input data, output data, and expected average reward are accumulated each iteration in order to build the reinforcement learning model)
Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by passing the input data, output data, and reward to the next iteration, with a motivation to build a reinforcement learning model (p. 3930, col. 2, under section B. – Notation)

Regarding claim 14, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The computer-readable storage medium of claim 13, 
Further, Hoof teaches: wherein the steps are implemented recursively until the tuple converges to a stable status. (Hoff teaches on p. 3931, col. 1, second paragraph from the bottom: “The variance of policies typically shrinks after each iteration, such that the policy converges to a (locally) optimal policy.”)
Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by converging the parameters, with a motivation to build a reinforcement learning model (p. 3930, col. 2, under section B. – Notation) Although Cho does not explicitly teach convergence, it is implied through the “target sequence” language (Hoff teaches on page 1, col. 2, first paragraph: “the decoder maps the vector representation back to a variable-length target sequence.”) 

Regarding claim 15, Cho teaches: A computer system for training a sequence learning model based on reinforcement learning and neural networks, the system comprising: a processor; and memory storing an encoder for: (Cho discloses an experimental section on p. 5, col. 1, which indicates the presence of a computer system comprising a processor and memory)

retrieving input sequence data, the input sequence data including one or more input time sequences; (Examiner is interpreting “input sequence data” as any type of sequential data input into a neural network. The “input time sequences” may also be an input time series. Cho teaches on page 2, first paragraph (all of Cho’s page numbers refer to the PDF page number):

    PNG
    media_image2.png
    241
    392
    media_image2.png
    Greyscale

The input time sequence is the sequence                         
                            x
                            =
                            (
                            
                                
                                    x
                                
                                
                                    1
                                
                            
                            ,
                             
                            …
                            ,
                             
                            
                                
                                    x
                                
                                
                                    T
                                
                            
                            )
                        
                    . )
encoding the input sequence data into an output symbol sequence containing output symbol data using a first neural network trained to implement a sequence learning model, the output symbol data including one or more symbolic representations; (Examiner is interpreting a “sequence learning model” as a machine learning model. Cho teaches on page 2, column 2: “The encoder is an RNN that reads each symbol of an input sequence x sequentially. As it reads each symbol, the hidden state of the RNN changes according to Cho Eq. (1)”. The output symbol sequence is interpreted as c.)
a decoder for: decoding, using a second neural network, the output symbol data to decoded sequence data, the decoded sequence data including one or more decoded time sequences that are to match the one or more input time sequences in the input sequence data; (Cho teaches on page 2 column 1, under section 2.2 RNN Encoder-Decoder: “we propose a novel neural network architecture that learns to encode a variable-length sequence into a fixed-length vector representation and to decode a given fixed-length vector representation back into a variable-length sequence.” The decoded time sequence matches the input time sequence. Further, Cho teaches on page 2, column 2, second paragraph: “The decoder of the proposed model is another RNN which is trained to generate the output sequence by predicting the next symbol y_t given the hidden state h_< t >”. According to Figure 1, the encoder encodes the input states x_1 through x_T, and the decoder outputs predicted states y_T’ through y_1.

    PNG
    media_image3.png
    317
    325
    media_image3.png
    Greyscale

And the encoder is also for: training the first neural network to update the sequence learning model based on [a comparison], (Examiner is interpreting updating the model as changing any of the model’s parameters. Cho teaches that the model parameter theta gets updated during training – see Cho page 2, col. 2 to page 3, col. 1:

    PNG
    media_image4.png
    124
    330
    media_image4.png
    Greyscale


    PNG
    media_image5.png
    107
    307
    media_image5.png
    Greyscale

wherein training further comprises: determining a length of the output symbol data (subscript “n” in                         
                            
                                
                                    y
                                
                                
                                    n
                                
                            
                        
                     is a length of the output symbol data)

However, Cho does not explicitly teach: comparing the decoded sequence data with the input sequence data; and 
[training is based on] the comparison 
wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and 
wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
adjusting one or more parameters of the sequence learning model to maximize the expected end reward.
But Hoof teaches: comparing the decoded sequence data with the input sequence data; (Examiner is interpreting “comparing” as finding a reconstruction error computed by error loss function. Hoof teaches this limitation on page 3929, col. 2, second paragraph, copied below. The input robot state x is analogous to the input time sequence of the instant claim, and the reconstructed input robot state x’ is analogous to the decoded time sequence of the instant claim.

    PNG
    media_image6.png
    498
    608
    media_image6.png
    Greyscale


Hoof also teaches: the comparison (As stated by Hoof on page 3929, col. 3, second paragraph and in the excerpt copied above, “The parameters are updated by gradient descent on the reconstruction error”. The reconstruction error can be used to update the model parameters.)
Further, Hoof teaches: wherein training further comprises: estimating an expected end reward, wherein the expected end reward is based on: a distance between the decoded time sequence and the input time sequence; and (Examiner is interpreting “estimating” as computing a probabilistic calculation. Examiner is interpreting “end reward” as any reward in a reinforcement learning system, such as an average reward, because the BRI of “end reward” is broad enough to allow for this interpretation.  In Hoof, the reward function must be based on the reconstruction error from z is based on the backpropagation of the reconstruction error. See Hoof Figure 2, copied below, which shows that the error from the encoder propagates to z).

    PNG
    media_image7.png
    323
    599
    media_image7.png
    Greyscale

wherein training further comprises: adjusting one or more parameters of the sequence learning model to maximize the expected end reward. (As stated by Hoof on page 3929, col. 3, second paragraph, “The parameters are updated by gradient descent on the reconstruction error”. Also, Hoof teaches maximizing the end reward on p. 3930, col. 2, under section B.

    PNG
    media_image8.png
    342
    593
    media_image8.png
    Greyscale

    PNG
    media_image9.png
    75
    569
    media_image9.png
    Greyscale

    PNG
    media_image10.png
    62
    593
    media_image10.png
    Greyscale

(^ from p. 3931 col. 1)

Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Hoof’s system into Cho’s system by finding the error and by modifying the updating of Cho with the specific comparison as taught by Hoof, with a motivation of obtaining stable policy updates (Hoof discloses on page 3929, top of col. 2: “Our approach consists of two steps: autoencoders are used to learn a suitable representation and non-parametric relative entropy policy search is used to obtain stable policy updates.”)

wherein the expected end reward is based on: … an additive inverse of a length of the output symbol sequence; and 
	But Uusitalo teaches: wherein the expected end reward is based on: …a length… (Uusitalo para. [0060] teaches penalizing/minimizing a length in a reward, however it is physical lengths and not lengths of a vector. (Uusitalo para [0060]: “The exemplary embodiments of this invention also enable pathfinding while optimizing multiple objectives. The goals for a pathfinding task can be set mathematically. For example, one goal can be to minimize a length of time taken by a user of the device 10 to travel from point A to point B, and to minimize the interference experienced by the user device 10 at each step of the travel. These goals can be formed into rewards at each time step. For example, interference level is a penalty (negative reward), and distance from point B is also a penalty. It should be noted that the total reward or penalty for any action involves the `goodness` of the action towards multiple goals (reaching the target, interference avoidance, and potential other goals of the user”).
	Uusitalo is in the same field of endeavor as the claimed invention, namely, devices for penalizing/minimizing parameters.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Uusitalo’s system into the combination of Cho and Hoof’s system by penalizing a length, with a motivation to optimize an objective (Uusitalo para. [0060).
Lastly, Yasunaga teaches: wherein the expected end reward is based on: … an additive inverse… of the output symbol sequence. (Yasunaga Para. [0174]: “Regularization is introduced to penalize the complexity of the model, and may provide a penalty to the norm (vector length) of the parameter.” The additive inverse (subtraction) is inherent to regularization/penalization.)
Yasunaga is in the same field of endeavor as the claimed invention, namely, neural networks. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have incorporated the teachings of Yasunaga’s system into the combination 


Regarding claim 20, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The system of claim 19, 
Further, Hoof teaches: wherein the encoder further for: storing a tuple of the input sequence data, the output symbol data and the expected end reward. (Hoof teaches on p. 3930, col. 2, section B, that the input data, output data, and expected average reward are accumulated each iteration in order to build the reinforcement learning model)
Therefore, it would have been obvious to one skilled in the art to have incorporated the teachings of Hoof into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by passing the input data, output data, and reward to the next iteration, with a motivation to build a reinforcement learning model (p. 3930, col. 2, under section B. – Notation)

Claims 2, 9, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cho, Hoof, Uusitalo, and Yasunaga as applied to claims 1, 8, and 15 above, respectively, and further in view of Zeiler (“ADADELTA: An Adaptive Learning Rate Method”, 2012), hereinafter Zeiler.
Regarding claim 2, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The method of claim 1, 
wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. 
	But Zeiler teaches: wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. (Zeiler teaches on PDF page 5, column 2, section 4.4 Speech Data, second paragraph: “The neural network is setup… where the inputs are 26 frames of audio”. Note that Cho’s RNN Encoder-Decoder is trained on corpora of printed text using Zeiler’s ADADELTA and gradient descent algorithm with the same hyperparameters as Zeiler’s model (Cho p. 6, col. 1, second full paragraph). The model in Cho translates text from English to French (Cho page. 1, col. 2, para. 2). The model in Zeiler outputs senome labels from speech audio (Speech Data, second paragraph). 
	Therefore, it would have been obvious to one of ordinary skill in the art to have incorporated the teachings of Zeiler’s system into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by training Cho’s HMM Encoder-Decoder with audio samples, with a motivation to analyze speech audio.

Regarding claim 9, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The computer-readable storage medium of claim 8,
However, the combination of Cho, Hoof, Uusitalo, and Yasunaga does not explicitly teach: wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. 
	But Zeiler teaches: wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. (Zeiler teaches on page 5, column 2, section 4.4 Speech Data, second paragraph: “The neural network is setup… where the inputs are 26 frames of audio”. Note that Cho’s RNN Encoder-Decoder is trained on corpora 
	Therefore, it would have been obvious to one of ordinary skill in the art to have incorporated the teachings of Zeiler’s system into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s  system by training Cho’s HMM Encoder-Decoder with audio samples, with a motivation to analyze speech audio.

Regarding claim 16, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The system of claim 15, 
However, the combination of Cho, Hoof, Uusitalo, and Yasunaga does not explicitly teach: wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. 
	But Zeiler teaches: wherein the input time sequence includes financial data, a temperature recording, a sound wave, a handwritten note of pen trajectories, or a video of human actions. (Zeiler teaches on page 5, column 2, section 4.4 Speech Data, second paragraph: “The neural network is setup… where the inputs are 26 frames of audio”. Note that Cho’s RNN Encoder-Decoder is trained on corpora of printed text using Zeiler’s ADADELTA and gradient descent algorithm with the same hyperparameters as Zeiler’s model (Cho p. 6, col. 1, second full paragraph). The model in Cho translates text from English to French (Cho page. 1, col. 2, para. 2). The model in Zeiler outputs senome labels from speech audio (Speech Data, second paragraph). 
	Therefore, it would have been obvious to one of ordinary skill in the art to have incorporated the teachings of Zeiler’s system into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by training Cho’s HMM Encoder-Decoder with audio samples, with a motivation to analyze speech audio.

Claims 3, 10, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Cho, Hoof, Uusitalo, and Yasunaga as applied to claims 1, 8, and 15 above, respectively, and further in view of Forcada et al. (“Recursive Hetero-Associative Memories for Translation”, 1997), hereinafter Forcada.
Regarding claim 3, the combination of Cho, Hoof, Uusitalo, and Yasunaga teaches: The method of claim 1,
However, the combination of Cho, Hoof, Uusitalo, and Yasunaga does not explicitly teach: further comprising: determining whether to encode the input sequence data into a symbolic representation or into an empty representation.
But Forcada teaches: further comprising: determining whether to encode the input sequence data into a symbolic representation or into an empty representation. (Examiner is interpreting determining the left- and right-hand units in the last paragraph of p. 455 as the “determining” recited in the claim. Forcada teaches on p. 455:

    PNG
    media_image11.png
    180
    628
    media_image11.png
    Greyscale
 

    PNG
    media_image12.png
    108
    635
    media_image12.png
    Greyscale
 Forcada also discloses an autoencoder in Fig. 1 on p. 457. 
Therefore, it would have been obvious to one of ordinary skill in the art to have incorporated the teachings of Forcada’s system into the combination of Cho, Hoof, Uusitalo, and Yasunaga’s system by encoding input data into an empty representation with a motivation to represent null symbols (Forcada p. 455: the empty string                         
                            ϵ
                        
                     is also known as “nil”).  

Claims 10 and 17 are computer product and system claims corresponding to method claim 3. Claims 10 and 17 are rejected for the same reasons as method claim 3. 
Allowable Subject Matter
Claim 5, 12, and 19 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Response to Arguments
Applicant's arguments filed 03/16/2021 in response to the previous Office Action mailed 12/16/2020 have been fully considered but they are not fully persuasive.
Drawings Objections: The previous Office Action objected to Fig. 4 element 420, Fig. 5 element 504, and the numeral 520 in paragraph [0054] of the specification. Applicant’s arguments regarding the drawing objections have been fully considered and are persuasive. The drawings objections have been withdrawn due to the Remarks, the substitute drawings and the substitute specification filed 03/16/2021.

Specification Objections: The previous Office Action objected to the spacing of the Algorithm and a minor informality in the specification. Applicant’s arguments regarding the specification objections have been fully considered and are persuasive. The specification objections have been withdrawn due to the Remarks and the substitute specification filed 03/16/2021.

Claim Objections: The previous Office Action objected to minor informalities in the claims. Applicant’s arguments regarding the claim objections have been fully considered and are persuasive. The claim objections have been withdrawn due to the Remarks and the amended claims filed 03/16/2021.

35. U.S.C. 112(b) Claim Rejections: The previous Office Action rejected claims 5-7, 12-14, and 19-20 under 35 U.S.C. 112(b). Applicant’s arguments regarding the claim rejections have been fully considered and are persuasive.  The 35 U.S.C. 112(b) claim rejections have been withdrawn in light of Applicant’s Remarks and amended claims filed 03/16/2021. 

35. U.S.C. 101 Claim Rejections: The previous Office Action rejected claims 1-20 under 35 U.S.C. 101. Applicant’s arguments regarding the claim rejections have been fully considered and are persuasive.  The 35 U.S.C. 101 claim rejections have been withdrawn in light of Applicant’s Remarks and amended claims.

35. U.S.C. 103 Claim Rejections: Applicant’s arguments with respect to claim(s) 1-4, 6-11, 13-18 and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the 
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Asher H. Jablon whose telephone number is (571)270-7648.  The examiner can normally be reached on Monday - Friday, 9:00 am - 6:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available 






/ASHER H. JABLON/Examiner, Art Unit 2122                                                                                                                                                                                                        
/ERIC NILSSON/Primary Examiner, Art Unit 2122