Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Detailed Action
This office action is responsive to the Amendment filed on 25 March 2021.  As directed by the Amendment, claims 1 and 26 have been amended, claims 24-25 and 27 have been canceled, and claims 28-30 have been added.  Claims 1-6, 8-14, 21-23, 26, and 28-30 are pending in the application.


Response to Arguments
The arguments presented in the Remarks filed on 25 March 2021 have been fully considered by the Examiner.
On page 13 of the Remarks, the Applicant states:

    PNG
    media_image1.png
    589
    664
    media_image1.png
    Greyscale

	The Examiner respectfully disagrees and notes that in the previous rejection, the Bergstra reference was not relied upon to teach the limitations of claim 12.  Rather, the rejection relied on the Triefenbach reference, which the Examiner contends fairly reads wherein the first model includes a plurality of input nodes that sequentially input a plurality of input values at each time point of the first input data sequence, (Triefenbach, pg. 2441, Fig. 1, a reservoir network has a plurality of input nodes to receive elements of the input sequence U; pg. 2443, Col. 1, lines 19-26, "it is easy to control the amount of context C that can be modeled by the reservoir…", "one divides the input stream into blocks of length B and lets the forward reservoir process all the frames of that block in a chronological order [Note: the input is broken up into frames, and these frames are applied to the plurality of reservoir inputs U in order to capture time-based context surrounding the frame representing a particular time] and a weight parameter between each input node and each input value at a time point before a time point corresponding to the plurality of input nodes, (Ibid., [the weight matrix Win in Fig. 1 contains the weights between the input U and the reservoir, and each input node contains a value corresponding to a sample from particular time or from a previous time when processing in the forward direction])


	On page 14 of the Remarks, the Applicant further argues:

    PNG
    media_image2.png
    681
    648
    media_image2.png
    Greyscale


	The Examiner respectfully disagrees, and maintains that the Wakuya reference teaches the recited claim limitations.  The additional citations from Wakuya regarding i.e. the distance into the future that the model(s) must predict) merely provide further context and detail regarding when and why a particular weight parameter in the system of Wakuya might be more difficult to learn or in a first model as opposed to a second model, or more accurately learned in a second model as opposed to a first model, as required by the claims.

	The remaining arguments in the Remarks are based directly or indirectly upon newly amended claim limitations or newly added claims, and are moot in view of the new grounds of rejection presented below.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-6, 8-10, and 12-13 are rejected under 35 U.S.C. §103 as being unpatentable over Triefenbach et al., "Acoustic Modeling With Hierarchical Reservoirs," IEEE Transactions on Audio, Speech, and Language Processing, Vol. 21, No. 11, November 2013, hereinafter “Triefenbach” (previously cited) in view of Kim et al. (US 2008/0208547), hereinafter “Kim” (previously cited), Ikada (US 2011/0066579) (previously cited) and further in view of Xiao Hu et al., “Time series prediction with a weighted bidirectional multi-stream extended Kalman filter,” Neurocomputing 70 (2007), pp. 2392-2399, hereinafter “Hu”.
Regarding claim 1, Triefenbach discloses [a] computer-implemented method for learning a first model, comprising: generating, by a processor, a second model based on the first model, (Triefenbach pg. 2443, second paragraph, "The second approach consists of introducing bi-directional processing.  This approach requires two reservoirs: [correspond to models] one that processes the inputs in a chronological order (from left-to-right) and another that processes them in reverse order (from right-to-left)) the first model being configured to perform a learning process based on sequentially inputting […] each of a plurality of pieces of input data that include a plurality of input values and that are from a first input data sequence […], (Triefenbach pg. 2441, Fig. 1 Reservoir network [corresponds to model performing a learning process], showing input sequence U [corresponds to first input data sequence]) the second model being configured to learn a first learning target parameter included in the first model based on inputting, in an order differing from an order in the first model […], each of a plurality of pieces of input data that include a plurality of input values and are from a second input data sequence; (Triefenbach pg. 2443, second paragraph, "The second approach consists of introducing bi-directional processing.  This approach requires two reservoirs: [correspond to models] one that processes the inputs in a chronological order (from left-to-right) and another that processes them in reverse order (from right-to-left) [corresponds to “second input data sequence”]; Triefenbach pg. 2441, Fig. 1, a reservoir network has trainable readout weights Wout [corresponds to a learning target parameter]; Triefenbach pg. 2441, Col. 2, ¶ 4, "The aim is to tune the weight matrix Wout so that the readouts adhere to posterior class probabilities"; Triefenbach pg. 2443, Col. 1, ¶ 2, "The output neurons then read out the combined state of the two reservoirs."; Triefenbach pg. 2442, Col. 2, equations (9) and (10) and § IIID. "Training the Readout Weights," "...the optimal regression coefficients that minimize the root mean squared error between the desired and the computed outputs emerge from a set of linear equations." [Note: one model receives an input sequence in chronological order while the other model receives an input sequence in reverse chronological order, then the outputs from the two models are combined and used with the error minimization technique to learn the trainable Wout weights for both models.  Therefore, the output from each model is used to train the Wout weights for the other model]) performing, by the processor, a learning process using both the first model and the second model; (Triefenbach pg. 2441, Col. 2, ¶ 4, "The aim is to tune the weight matrix Wout [corresponds to "trainable readout weights" for Fig. 1]..."; Triefenbach pg. 2442, Col. 2, equations (9) and (10) and § IIID. "Training the Readout Weights," "...the optimal regression coefficients that minimize the root mean squared error between the desired out weights for both models.  Therefore, the output from each model is used to train the Wout readout weights for the other model in addition to its own Wout readout weights]) 

Triefenbach does not disclose starting at an arbitrary point in a first input data sequence.
Hu teaches starting at an arbitrary point in a first input data sequence (Hu, pg. 2393, Col. 2, second full paragraph “When a time series exhibits non-trivial behavior (as is the case for this CATS benchmark), it appears advantageous to employ the multi-stream procedure. Multi-stream training is based on the principle that each weight update should attempt to satisfy simultaneously the demands from multiple input-output pairs. In each cycle of training, a specified number Ns of starting points can be randomly selected [corresponds to claimed “starting at an arbitrary point in a first input data sequence”] in a chosen set of files (in just one file of the CATS benchmark). Each such starting point is the beginning of a training trajectory (stream).)
Hu is analogous art, as it is directed to training a machine learning model on time-series data.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the training of Triefenbach with the random starting points of Hu, the benefit being that beginning training at random points in the 

Triefenbach further does not disclose into a FIFO memory or into another FIFO memory.
Ikada teaches into a FIFO memory and into another FIFO memory (Ikada, Fig. 2 and ¶ [0040] “the delay processing units 126 have a first-in-first-out (FIFO) structure in which new input data displace the oldest stored data […] Each delay processing unit 126 therefore stores a fixed quantity of data, representing a certain time interval extending back from the present.” [A FIFO unit is operable to store data from newer to older or from older to newer, depending only on the time order in which the data is input into the FIFO storage unit.]
Ikada is analogous art, as it in the field of using machine learning to predict time-series data.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the predictive models of Triefenbach with the FIFO delay units of Ikada, the benefit being that the FIFO delay processing units allow the system to store multiple data received over a period of time for presentation to a neural network, as cited by Ikada at ¶ [0039], “The delay processing units 126 store data temporarily.  At any given time t, the delay processing units 126 store the data needed by the neurons to calculate a predicted value for the time series at time t+1.”


Triefenbach further does not disclose and storing, in a memory device, the first model that has been learned.
Kim teaches and storing, in a memory device, the first model that has been learned. (Kim, ¶[0081] the system may include a memory 123; ¶ [0084] "The memory 123 may further store the 3D model generated by the 3D model generation apparatus")
Kim is analogous art, as it addresses the task of storing and retrieving generated models.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to store the models of Triefenbach using the memory storage of Kim, the benefit being that storing the models in memory allows them to be retrieved later when needed or appropriate, as recited in Kim, Fig. 13, steps 133-135 and ¶ [0087], “The rendering system 125 may further extract such a 3D model corresponding to the current position information of the navigation system, received by the GPS 124, from among models stored in the memory 123…”



Regarding claim 2, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.  Further, Kim teaches wherein the storing the first model that has been learned includes deleting, from the memory device, the second model that has been learned (Kim, ¶ [0085] the and outputting the first model that has been learned as a predictive model based on an input data sequence. (Kim, ¶ [0087] the system may extract a model corresponding to the current position information of the navigation system received by the GPS [corresponds to "based on an input data sequence"] from among 3D models stored in the memory 123; Kim, ¶ [0088] the display system 123 may then display such a rendering [of the model] performed by the rendering system 125.)
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to apply the deleting and displaying of models as taught by Kim to the models generated by Triefenbach, the benefit being that by deleting models from memory, a system may operate while only having a small memory capacity, as recited by Kim at ¶ [0085], ] the apparatus may "delete 3D models stored in the memory 123, thereby operating only with a small capacity memory" and the further benefit being that a model may be visually presented to a user, as recited by Kim at ¶ [0085] “The display system may then display such a result of the rendering performed by the rendering system.”

Regarding claim 3, the combination of references as applied to claim 2 above teaches [t]he computer-implemented method of claim 2.  Further, Triefenbach discloses wherein the generating the second model includes generating the second model for learning the learning target parameter by inputting, in a backwards order, each of the plurality of pieces of input data from the second input data sequence. (Triefenbach, pg. 2443, § III(B)(2) "Bi-Directional Processing," ¶ 

Regarding claim 4, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.  Further, Triefenbach discloses wherein the first input data sequence and the second input data sequence are time-series input data sequences, (Triefenbach, § 1 "Introduction," ¶ 2, "An important component of any speech recognizer is the Acoustic Model (AM).  It is responsible for capturing the relation between the acoustic signal and the phonetic units that were spoken."; § 2 "Acoustic Modeling," ¶ 1, "It is generally acknowledged that the spoken message is encoded in the speech signal as a sequence of phones." [speech signal corresponds to "time series input data sequences"]) wherein the first model inputs the first input data sequence in order from older to newer ones of the plurality of pieces of input data, (Triefenbach, pg. 2443, § III(B)(2) "Bi-Directional Processing," ¶ 1, "This approach requires two reservoirs: one [corresponds to the first model] that processes the inputs in a chronological order (from left-to-right) and another [that corresponds to the second model] that processes them in reverse order (from right-to-left)." and wherein the second model inputs the second input data sequence in order from newer to older ones of the plurality of pieces of input data. (Ibid.)

claim 5, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.  Further, Triefenbach discloses wherein the first model and the second model each include the first learning target parameter and a second learning target parameter, (Triefenbach, pg. 2441, Fig. 1, each reservoir network has a plurality of trainable weighted output connections between the reservoir and the readout) and wherein the performing the learning process includes: learning the second learning target parameter by using the first model without changing the first learning target parameter, (Triefenbach, pg. 2443, Col. 1, lines 15-16, "The output neurons then read out the combined state of the two reservoirs."; pg. 2441, Col. 2, ¶ 4, "The aim is to tune the weight matrix Wout [corresponds to "trainable readout weights" for Fig. 1]..."; Triefenbach pg. 2442, Col. 2, eqs (9) and (10) and § IIID. "Training the Readout Weights," "...the optimal regression coefficients that minimize the root mean squared error between the desired and the computed outputs emerge from a set of linear equations." [Note: The error-minimization routine that updates the trainable readout weights for both models based on the models' combined outputs does not necessitate that every weight is changed with each error-minimization iteration; the training method is operable to change some output weights while leaving others unchanged during a single training iteration]) and learning the first learning target parameter by using the second model without changing the second learning target parameter. (Ibid.)

Regarding claim 6, the combination of references as applied to claim 5 above teaches [t]he computer-implemented method of claim 5.  Further, Triefenbach wherein the first learning target parameter is operable to be learned with higher accuracy by learning using the second model than by learning using the first model, (Triefenbach, pg. 2443, Col. 1, last three lines, "Obviously, one can further reduce the latency by employing a backward reservoir with a lower time constant than the forward reservoir."; pg. 2443, Col. 2, ¶ 3, "…the first layer must model the fast dynamics occurring in the acoustic feature stream with important frequencies of up to 50 Hz and those should have a time constant fitting to these dynamics. Therefore, we choose short acoustic units, namely phone states, as the units to distinguish.  The second layer analyzes the sub-phonetic state scores which seem to be much smoother, with main frequencies between 15 and 30 Hz." [Note: because the time constant for each reservoir can be tailored to capture high-frequency, rapidly changing features of the monitored feature stream or smoother, lower-frequency features, the contributions to the output readouts from the second model may be more accurate than the outputs from the first model (or vice-versa), depending on the features of interest in the feature stream.] and wherein the second learning target parameter is operable to be learned with higher accuracy by learning using the first model than by learning using the second model. (Ibid.)

Regarding claim 8, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.  Further, Triefenbach discloses wherein the first input data sequence and the second input data sequence are input data sequences for learning that are different from each other (Triefenbach, § III(D) "Training the Readout Weights," ¶ 1 and equations (9) and (10), and included in a plurality of input data sequences for learning. (Ibid., the training inputs from the matrix R)

Regarding claim 9, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.  Further, Triefenbach discloses wherein the performing the learning process includes performing the learning process with the first model a greater number of times than the learning process with the second model. (Triefenbach, pg. 2443, last three lines, the forward and backward reservoirs may have different time constants [and therefore, different sampling rates/frequencies]; [Note: because the different sampling rates between the two models, the readouts from the model with the higher sampling rate [e.g., the "first model"] will be updated more often than the readouts from the model with the lower sampling rate.  As a result, the training process described in § III(D) "Training the Readout Weights" will be performed more often using updated readout weights from the first model than from the second model])

Regarding claim 10, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.  Further, Triefenbach discloses wherein the performing the learning process includes performing the learning process with the first model using a higher learning rate than is used for the learning process with the second model. (Triefenbach, pg. 2443, Col. 1, lines 17-18, "The reservoirs can be chosen identical for convenience but this is not a necessity"; pg. 2443, last three lines, the reservoirs may have different time constants [determining the rates at which the input feature stream is sampled]; [also, equation (9) on pg. 2442 contains the regularization term (lowercase epsilon) which varies the contribution of the current readout weights when calculating updated readout weights, thus varying how rapidly the error-minimization learning process can change the trainable readout weights [similar to the learning rate used in gradient descent back-propagation].)

Regarding claim 12, the combination of references as applied to claim 4 above teaches [t]he computer-implemented method of claim 4.  Further, Triefenbach discloses wherein the first model includes a plurality of input nodes that sequentially input a plurality of input values at each time point of the first input data sequence, (Triefenbach, pg. 2441, Fig. 1, a reservoir network has a plurality of input nodes to receive elements of the input sequence U; pg. 2443, Col. 1, lines 19-26, "it is easy to control the amount of context C that can be modeled by the reservoir…", "one divides the input stream into blocks of length B and lets the forward reservoir process all the frames of that block in a chronological order [Note: the input is broken up into frames, and these frames are applied to the plurality of reservoir inputs U in order to capture time-based context surrounding the frame representing a particular time] and a weight parameter between each input node and each input value at a time point before a time point corresponding to the plurality of input nodes, (Ibid., [the weight in in Fig. 1 contains the weights between the input U and the reservoir, and each input node contains a value corresponding to a sample from particular time or from a previous time when processing in the forward direction] and the second model includes a plurality of input nodes that input, in a backwards order, a plurality of input values at each time point of the second input data sequence, (Triefenbach, pg. 2443, Col. 1, lines 12-15 the other reservoir [corresponds to the second model] processes the inputs in reverse order (right-to-left); pg. 2441, Fig. 1, there are a plurality of input nodes between the input data sequence and the reservoir; pg. 2443, ¶ 2, [similar to the forward processing, the backwards processing takes inputs in reverse chronological order, with the input frames containing values corresponding to a particular time and the time periods after that time] and a weight parameter between each input node and each input value at a time point after the time point corresponding to the plurality of input nodes. (Triefenbach, pg. 2441, Fig. 1, [the weight matrix Win in Fig. 1 contains the weights between the input sequence U and the reservoir, and each input node contains a value corresponding to a sample from particular time or from a later time when processing in the backward direction]

Regarding claim 13, the combination of references as applied to claim 12 above teaches [t]he computer-implemented method of claim 12.  Further, Triefenbach discloses wherein the first model further includes a weight parameter between each input node and each of a plurality of hidden nodes corresponding to the time point before the time point corresponding to the plurality of input nodes, (Triefenbach, pg. 2441, Fig. 1, reservoir R and weighted connections Wrec [after the rec connecting the weighted input values to a plurality of hidden nodes within the reservoir] and a weight parameter between each hidden node and each input value corresponding to the time point before the time point corresponding to the plurality of input nodes, (Ibid., there are weights Wrec between the weighted inputs to the reservoir R and the hidden nodes within the reservoir; pg. 2441, Fig. 1, [a reservoir network has a plurality of input nodes to receive elements of the input sequence]; pg. 2443, Col. 1, lines 19-26, "it is easy to control the amount of context C that can be modeled by the reservoir…", "one divides the input stream into blocks of length B and lets the forward reservoir process all the frames of that block in a chronological order" and the second model further includes a weight parameter between each input node and each of a plurality of hidden nodes corresponding to the time point after the time point corresponding to the plurality of input nodes, (Ibid., [the backward processing occurs in the same manner as the forward processing, processing the input in reverse chronological order) and a weight parameter between each hidden node and each input value corresponding to the time point after the time point corresponding to the plurality of input nodes. (Triefenbach, pg. 2441, Fig. 1, [after the weighted inputs are presented to the reservoir R, there are weighted connections Wrec connecting the weighted input values to a plurality of hidden nodes within the reservoir]; pg. 2443, lines 22-30, “one divides the input stream in blocks of length B […] In order to start backward processing, one has to wait until C frames of the next block are available, and after having reset the backward reservoir, one performs the B + C backward processing steps to obtain the B reservoir states that are needed to 

Claim 11 is rejected under 35 U.S.C. §103 as being unpatentable over Triefenbach, Kim, Ikada and Hu, and further in view of Naito et al., (US 5,841,946), hereinafter “Naito” (previously cited).

Regarding claim 11, the combination of references as applied to claim 3 above teaches [t]he computer-implemented method of claim 3.
The above combination does not teach wherein the performing the learning process includes obtaining the first model that has been learned by performing the learning process with the first model last.
Naito teaches wherein the performing the learning process includes obtaining the first model that has been learned by performing the learning process with the first model last. (Naito, Col. 11, lines 31-43, “there are provided separately a forward prediction processing [corresponds to the first model] as a first prediction processing for the prediction in the forward directions and a reverse prediction processing [corresponds to the second model] as a second prediction processing for predicting variation in the reverse direction.” “The forward prediction processing and the reverse prediction processing may be activated or executed concurrently in parallel or sequentially as the alternative.”

It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the predictive models of Triefenbach with the sequential model implementation of Naito, the benefit being that by implementing the forward and backward models sequentially rather than concurrently, the system memory requirements are reduced, as recited by Naito at Col. 13, lines 51-55, “In the case of the predictions made in parallel, the time taken for arithmetic processing or calculation can be shortened, while according to the sequential prediction is adopted, capacity of the memory as required to this can correspondingly be reduced.”

Claim 14 is rejected under 35 U.S.C. §103 as being unpatentable over Triefenbach, Kim, Ikada and Hu, and further in view of Bergstra (US 2017/0140259) (previously cited).

Regarding claim 14, the combination of Triefenbach and Kim as applied to claim 13 above teaches [t]he computer-implemented method of claim 13.
The above combination does not teach wherein the performing the learning process includes: learning the weight parameter between each hidden node and each input value corresponding to the time point before the time point corresponding to the plurality of input nodes in the first mode, using the learning process with the first model; and learning the weight parameter between each input node and each of the plurality of hidden nodes corresponding to the time point after the time point corresponding to the plurality of input nodes in the second model, using the learning process with the second model.
Bergstra teaches wherein the performing the learning process includes: learning the weight parameter between each hidden node and each input value corresponding to the time point before the time point corresponding to the plurality of input nodes in the first mode, using the learning process with the first model; (Bergstra, ¶ [0143] The controller “trains the artificial neural network over data.”; “Training an artificial neural network involves simulating the network and evolving (or adapting) of weights for the units and edges of the neural network until an exit condition is reached.” [the simulation of the network in the forward direction includes providing it with multiple inputs representing a signal value at a particular time and values from before that time]) and learning the weight parameter between each input node and each of the plurality of hidden nodes corresponding to the time point after the time point corresponding to the plurality of input nodes in the second model, using the learning process with the second model. (Ibid., [the “second model” that performs backward processing may be trained using the same steps as training the first model that performs forward processing])
Bergstra is analogous art, as it is directed to the construction and training of predictive models for time-series data.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the predictive models of Triefenbach with the trainable weights and neural network training of Bergstra, as this represents the application of the known technique of neural network training to achieve the predictable Ibid., “Training an artificial neural network involves simulating the network and evolving (or adapting) of weights for the units and edges of the neural network until an exit condition is reached.” ¶ [0146], “At optional act 510, the controller returns the weights and other information about the network and the training.” [returning a trained model at the conclusion of training])

Claims 21-23 are rejected under 35 U.S.C. §103 as being unpatentable over Triefenbach, Kim, Ikada and Hu, and further in view of Wakuya et al., "Time Series Prediction with a Neural Network Model Based on Bidirectional Computation Style: An Analytical Study and Its Estimation on Acquired Signal Transformation," Electrical Engineering in Japan, Vol. 145, No. 3, 2003, hereinafter “Wakuya.” (previously cited)

Regarding claim 21, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.

The above combination does not teach wherein a weight parameter that is more difficult to learn at a given time in the first model than the second model is set as the first learning target parameter, and a weight parameter that is more accurately learned at another given time in the second model than the first model is set as the second learning target parameter of the second model.
Wakuya teaches wherein a weight parameter that is more difficult to learn at a given time in the first model than the second model is set as the first learning target parameter, (Wakuya, pg. 52, Col. 1, "The weights W3, W2, W1, WF and WP are updated based on real-time recurrent learning to reduce the error ef."; Ikada, pg. 54, Col. 1, ¶ 1, "as the prediction width ahead increases, the context information about the past becomes important, and so the task becomes more difficult to train. Therefore, when the prediction width ahead is 3, the difference in time series processing ability between the bidirectional model and the future prediction system alone becomes a maximum" [As the future period to predict becomes longer, it becomes more difficult to train the future prediction system alone [learning the weights W3, W2, W1, WF and WP] compared to learning them using both the future prediction system and the past prediction system.]) and a weight parameter that is more accurately learned at another given time in the second model than the first model is set as the second learning target parameter of the second model. (Wakuya, pg. 54, equations (10)-(12) "In order to evaluate quantitatively, the index of improvement quality (IIQ) is defined by the ratio of the errors in the two models: […] It is clear that the bidirectional computation style is superior to the conventional unidirectional style." [The weights in the models are more accurately trained when incorporating the past prediction system compared to the future prediction system alone.])
Wakuya is analogous art, as it is in the field of using machine learning for future prediction of time series data.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the time series prediction of Niedzwiecki and Ikada with the bidirectional learning of Wakuya, the benefit being that bidirectional learning of a model produces superior results compared to unidirectional learning, as 

Regarding claim 22, the combination of references as applied to claim 21 above teaches [t]he computer-implemented method of claim 21.
Further, Wakuya teaches wherein the first learning target parameter is learned using the second model that more accurately learns the first learning target parameter than the first model. (Wakuya, pg. 52, Col. 1, ¶ 1, "Then, a local training phase for each signal processing subsystem is assumed: (i) a training phase for the future prediction system and (ii) a training phase for the past prediction system. Global training for the overall model is accomplished by repeating these two phases. [...] "The weights W3, W2, W1, WF, and WP are updated based on real-time recurrent learning to reduce the error ef." [The system weights are trained using the past prediction system in addition to the forward prediction system in order to reduce error (improving accuracy)])

Regarding claim 23, the combination of references as applied to claim 22 above teaches [t]he computer-implemented method of claim 22.
Further, Wakuya teaches wherein the first learning target parameter is expressed relatively as a weight parameter between a past input layer and a future hidden layer, (Wakuya, Pg. 51, § 2.2 ¶ 1 and Fig. 1, "In this figure, the circles represent single neuron layers without internal connections, and the arrows represent and wherein the second learning target parameter is expressed relatively as a weight parameter between a past hidden layer and a future input layer. (Ibid., [in the future prediction system, the input data is processed from older to newer, so the arrows connecting the sequential nodes represent weights between a past (older data) hidden layer and a future (newer data) input layer.])

Claims 26 and 28-30 are rejected under 35 U.S.C. §103 as being unpatentable over Triefenbach, Kim, and Ikada, and Hu and further in view of Osogami et al, “Learning dynamic Boltzmann machines with spike-timing dependent plasticity,” arXiv:1509.08634v1 29 September 2015, hereinafter “Osogami” (previously cited in Applicant’s Information Disclosure Statement filed on 11 January 2017).

Regarding claim 26, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.
Further, Osogami teaches wherein the learning apparatus comprises a first plurality of FIFO memories, including the FIFO memory, and a second plurality of FIFO memories, including the other FIFO memory, respectively corresponding to the first model and the second model, (Osogami, § 3.4 “Interpretation as an artificial i,j can be considered as an axon that stores the spikes traveling from i to j. The conduction delay of this axon is di,j and the spiked generated in the last di,j -1 steps are stored.”
wherein the first plurality of FIFO memories has a number of member FIFO memories which is greater than a number of nodes of the first model, (Ibid and Osogami Fig. 1(c), “Specifically, the binary bits correspond to the N bits of x[0] and M FIFO queues.” [The number of FIFO queues M (FIFO memories) is independent of the number of nodes N in a layer, and may therefore be greater than, less than, or equal to the number of nodes.]  and wherein the second plurality of FIFO memories has a number of member FIFO memories which is greater than a number of nodes of the second model. (Ibid., [The structure of Osogami is operable to be applied to multiple models])

Regarding claim 28, the combination of references as applied to claim 1 above teaches [t]he computer-implemented method of claim 1.

The above combination does not teach wherein a weight parameter is determined based on a positive value that is based on a product of the first learning target parameter and a first predetermined parameter and a negative value that is based on a product of a second learning target parameter and a second predefined parameter. 
Osogami teaches wherein a weight parameter is determined based on a positive value that is based on a product of the first learning target parameter and a first predetermined parameter and a negative value that is based on a product of a second learning target parameter and a second predefined parameter (Osogami, pg. 7, Equation 13 showing that weight Wi,j[δ] is equal to the summation over K of ui,j,k multiplied by λδ-δi,j when δ >= di,j [corresponds to the claimed “positive value that is based on a product of the first learning target parameter and a first predetermined parameter”], and when δ < di,j and not equal to zero, Wi,j[δ] is equal to the summation over L of -vi,j,l multiplied by ul-δ (corresponds to the claimed “negative value that is based on a product of a second learning target parameter and a second predetermined parameter”). [The Examiner notes that Equation 13 of Osogami is identical to “Expression 1” in ¶¶ [0042-43] of the instant disclosure, which were cited by the Applicant as providing support for newly added claims 28-30.]

Osogami is analogous art, as it is directed to training a machine learning model using time-series input data.
It would have been obvious to one of ordinary skill in the art prior to the effective filing date of the claimed invention to modify the model training of Triefenbach with the weight parameter determination of Osogami, as the learning method of Osogami is “simple, exact, and efficient.” (Osogami, pg. 6, § 3.2 “Deriving a specific learning rule.” “We thus propose a particular form of weight sharing, which is motivated by observations from biological neural networks [1] but leads to particularly simple, exact, and efficient learning rule.”

claim 29, the combination of references as applied to claim 28 above teaches [t]he computer-implemented method of claim 28.  Further, Osogami teaches wherein the weight parameter is a positive value based on a time point difference being greater than a predetermined delay constant, wherein the weight parameter is a negative value based on the time point difference being less than the predetermined delay constant and non-zero, wherein the weight parameter is zero when the time point difference is zero, and wherein the time point difference is between hidden nodes and the input data at two different time points. (Ibid. and Osogami, Fig. 1(c) and pg. 7, Equation 13 [The claim language is the English expression of Osogami’s Equation 13, where δ represents the claimed “time point difference” and di,j represents the claimed “predetermined delay constant.”  The time point difference “between hidden nodes and the input data at two different time points” is represented by the subscripts i and j denoting different time points as shown in Osogami Fig. 1(c).  As noted above in reference to claim 28, Osogami is one of the named inventors in the instant application, and Equation 13 of Osogami is identical to “Expression 1” cited in ¶¶ [0042-43] of the instant specification to provide support for newly added claims 28-30.]

Regarding claim 30, the combination of references as applied to claim 28 above teaches [t]he computer-implemented method of claim 28.  Further, Osogami teaches wherein the predefined parameters change in a predetermined manner in accordance with the time point difference. (Osogami, pg. 7, Equation 13 [The “predetermined parameters” λ and μ have superscripts of δ and –δ respectively, with δ .

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SCOTT R GARDNER whose telephone number is (469)295-9128.  The examiner can normally be reached on 8:00am - 5:00pm M-F.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann J Lo can be reached on 571-272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/SCOTT R GARDNER/Examiner, Art Unit 2126  
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126