Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Priority
Acknowledgment is made of applicant's claim for the present application filed on 03/20/2017.

Response to Arguments
Applicant's arguments filed on 07/07/2022 have been fully considered but they are not persuasive.
In Remarks pp. 6-8, Applicant contends: 
It is respectfully asserted that Boulanger is silent regarding the above-reproduced features at least because Boulanger does not appear to utilize a plurality of first parameters, and a plurality of second parameters for a first neural network and a recurrent neural network, respectively, as recited in the pending claims, but rather is silent regarding the same. 
Furthermore, notwithstanding the Examiner's assertions to the contrary in the response to arguments section of the present Office Action, it is again respectfully asserted, and emphasized that, under straightforward claim interpretation, the SAME INPUT is being applied to the input nodes of both the first neural network and to the recurrent neural network. The same means, to anyone of ordinary skill in the art as well as a layperson, "identical" or "without difference". The claim actually recites "IDENTICAL" now to make that point clear. Modifying what is put into one of these networks as disclosed in Boulanger is not the same or identical or without difference, but instead teaches away from these claim limitations. 
Moreover, inputting the same values into intermediate nodes as per Boulanger does not correspond to inputting the same values into input nodes as explicitly recited by claim 1.

Examiner’s response:
First of all, fig 2(b) along with “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units hˆ(t)” shows that the single and common input node of v(t) is shared by the RBM and the RNN. Thus, different inputs can’t be applied to the input nodes of the first neural network and the recurrent neural network since the RNN-RBM model shares the single and common input node.

 The examiner understands the applicant’s assertion about the same input. However, note that relevant claim limitations are “receiving an input by an input node of a first neural network …”, “receiving the input by an input node of a recurrent neural network …” and “the input received by the input node of the first neural network and the input received by the input node of the recurrent neural network are unmodified identical values of multiple time frames” (with emphasis added). In other words, the recited claim just says that an input is received by each neural network. There is nothing that prevents v(t) of eq (1), eq (7) and eq (11) in the Boulanger-Lewandowski reference from reading on the claimed input. Fig 2 of Boulanger-Lewandowski shows that v(1) starts as an input and is received by RNN and RBM as the the recited claim says, and eq (1), eq (7) and eq (11) clearly show that the input, v(t), is received by each neural network.

In addition, the examiner understands the applicant’s assertion about an intermediate node. However, even though v(t) looks like being provided to an intermediate node, the node that receives v(t) is an input node because, as expressed in eq (7) and eq (11) along with fig 2(b), the RNN-RBM receives input data via the input node of v(t), and calculates h^(t) and h(t) for RNN and RBM, respectively. In more detail, in case of the RBM, bv(1) is just used as a weight for the input v(t) based on eq (1), eq (7) and eq (8), and in case of the RNN, W2 is used just used as a weight for the identical input v(t) based on eq (11). 

Just for the sake of comparison, for example, the Elman network on Wikipedia (https://en.wikipedia.org/w/index.php?title=Recurrent_neural_network&oldid=769709356) calculates the hidden layer vector with the same mathematical expression as eq (11) of Boulanger-Lewandowski, and xt is just an “input vector” for the Elman network. In the same manner, v(t) is just used as an input for the RNN portion of the RNN-RBM model. 
In addition, the Training section on Wikipedia (https://en.wikipedia.org/w/index.php?title=Boltzmann_machine&oldid=758504676) says “The units in the Boltzmann Machine are divided into 'visible' units, V, and 'hidden' units, H. The visible units are those which receive information from the 'environment'”, and Boulanger-Lewandowski says “visible vector v (inputs)” in sec 2 and “We use an input of 88 binary visible units that span the whole range of piano from A0 to C8” in sec 6. Thus, the node of v(t) is clearly an input node. Furthermore, regarding the RBM portion, please refer to the graphical structure of the RTRBM of fig 2 from Sutskever et al. (The Recurrent Temporal Restricted Boltzmann Machine) as well.

Moreover, the relevant claim limitations appear to be 
“receiving an input by an input node of a first neural network including a plurality of first parameters; 
receiving the input by an input node of a recurrent neural network, … the recurrent neural network including a plurality of second parameters,
…
the input received by the input node of the first neural network and the input received by the input node of the recurrent neural network are unmodified identical values of multiple time frames”.

As noted in the rejections, Boulanger-Lewandowski teaches 
(Boulanger-Lewandowski, [fig 2] “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units hˆ(t) . The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)”; [sec 4] “The joint probability distribution of the RNN-RBM is also given by equation (7), but with hˆ(t) defined arbitrarily, here as per equation (11). … For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation:  
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) The RBM portion of the RNN-RBM (upper portion of Fig. 2(b)) is otherwise exactly the same as its RTRBM counterpart. This gives the single-layer RNN-RBM nine parameters: W, bv, bh, W’, W’’, hˆ(0), W2, W3, bhˆ.”; see also [secs 2-3];)

In other words, Boulanger-Lewandowski teaches that there are an RBM (i.e. “first neural network”, cf. the upper half of the RNN-RBM in fig 2(b)) and an RNN (i.e. “recurrent neural network”, cf. the lower half of the RNN-RBM in fig 2(b)). The RBM and the RNN both receive input data (i.e. “input”, cf. v(t) in fig 2(b)) over time via a node which receives the input data (i.e. “input node”, cf. the node of v(t) in fig 2(b)) over time. 
Boulanger-Lewandowski also teaches that the input data which is received by the RBM and the RNN is identical based on the single and common input node (i.e. “the input received by the input node of the first neural network and the input received by the input node of the recurrent neural network are unmodified identical values of multiple time frames”, cf. eq (7) has v(t) as an input, and eq (11) has v(t) as an input as well. In addition, fig 2(b) also shows the same input v(t) for the RNN-RBM.).


    PNG
    media_image2.png
    756
    645
    media_image2.png
    Greyscale


Therefore, the applicant’s arguments are not convincing. Please refer to “Response to Arguments” of the previous office action as well. 

In Remarks, p. 7, Applicant contends: 
Furthermore, claim 1 recites "a post-training given task", while Boulanger is directed to training. Hence, in this regard, Boulanger also teaches away. The Examiner is reading into the disclosure in Boulanger features that do not exist there, but instead is basing the argument on impermissible hindsight with respect to the present invention. For example, there is a discussion regarding updating RBM related biases post training, such as in an inference stage. The Examiner is requested to particularly point to any alleged portions of Boulanger that mention updating the RBM related biases post training. Indeed, "Over time" as cited by the Examiner can simply relate to "training time" without more.

Examiner’s response:
The relevant claim limitations appear to be 
“receiving the input by an input node of a recurrent neural network configured as a nonlinear extension of the first neural network to cooperatively process the input for a post-training given task until the post-training given task completion providing a final output from the recurrent neural network”.

As noted in the rejections, Boulanger-Lewandowski teaches 
(Boulanger-Lewandowski, [fig 2] “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units hˆ(t). The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)”; [secs 2-4] “input vector v(l) … The joint probability distribution of the RNN-RBM is also given by equation (7), but with hˆ(t) defined arbitrarily, here as per equation (11). … For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation:  
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) The RBM portion of the RNN-RBM (upper portion of Fig. 2(b)) is otherwise exactly the same as its RTRBM counterpart. This gives the single-layer RNN-RBM nine parameters: W, bv, bh, W’, W’’, hˆ(0), W2, W3, bhˆ.”; see also [sec 6] “probabilistic modeling of sequences of polyphonic music”; 
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) based on the input v(t) along with fig 2(b) reads on “receiving the input by an input node of a recurrent neural network”. In addition, “hˆ(0), W2, W3, bhˆ” read on “a plurality of second parameters”. Furthermore, “joint probability distribution of the RNN-RBM” reads on “nonlinear extension of the first neural network to cooperatively process the input”. Moreover, “
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
” reads on “final output from the recurrent neural network” since 
    PNG
    media_image3.png
    63
    78
    media_image3.png
    Greyscale
 is provided as a final output at each time step from the RNN for calculating biases for the RBM. Furthermore, e.g., testing may read on “post-training given task”. Besides, e.g., a node which receives an “input” data may read on “input node”. In other words, e.g., the node of v(t) in fig 2(b) may read on “input node”. Also, e.g., “input vector v(l)” along with “single-layer RNN (bottom portion of Fig. 2(b))” may read on “different input nodes” since the input of the first neural network and the input to the recurrent neural network are both mapped to v(t), which is a vector and thus comprises multiple input nodes, one for each element of the input vector v(t).)

In other words, Boulanger-Lewandowski teaches that there are an RBM (i.e. “first neural network”, cf. the upper half of the RNN-RBM in fig 2(b)) and an RNN (i.e. “recurrent neural network”, cf. the lower half of the RNN-RBM in fig 2(b)). The RBM and the RNN both receive input data (i.e. “input”, cf. v(t) in fig 2(b)) over time via a node which receives the input data (i.e. “input node”, cf. the node of v(t) in fig 2(b)) over time. 
In addition, Boulanger-Lewandowski also teaches that the RBM-related biases are updated over time in the training phase and in the inference phase as well (i.e. “post-training given task”, cf. “The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)” over time in fig 2(b)). In other words, after training (i.e. “post-training”), in the inference phase, the RBM-related biases are updated over t (which represents time) since the biases are variables (cf. “For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable)” in sec 4), and they are updated based on eqs (8)-(9) (cf. “
    PNG
    media_image4.png
    50
    294
    media_image4.png
    Greyscale
(8) and 
    PNG
    media_image5.png
    46
    300
    media_image5.png
    Greyscale
(8) in sec 3”). In addition, the datasets are time-based (cf. sec 6: “Each dataset contains at least 7 hours of polyphonic music and the total duration is approximately 67 hours. The polyphony (number of simultaneous notes) varies from 0 to 15 and the average polyphony is 3.9. We use an input of 88 binary visible units that span the whole range of piano from A0 to C8 and temporally aligned on an integer fraction of the beat (quarter note)”). 

Therefore, the applicant’s arguments are not convincing. Please refer to “Response to Arguments” of the previous office action as well.

Applicant’s arguments with respect to the amended limitation of claim 1 “the input of the first neural network and the input to the recurrent neural network comprising different input nodes” (pp. 8-14) have been considered but are moot because the arguments are directed to amended limitation(s) that has/have not been previously examined.

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159.  See MPEP §§ 706.02(l)(1) - 706.02(l)(3) for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1-6 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1-6 of copending Application No. 15/463,195 (reference application, 03/17/2022). Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.

Instant application
Reference application
1. (Currently amended) A method comprising: 






receiving an input by an input node of a first neural network including a plurality of first parameters, the input comprising a N-dimensional time-series input data vector, wherein N is an integer greater than 0 1; 


receiving the input by an input node of a recurrent neural network configured as a nonlinear extension of the first neural network to cooperatively process the input for a post-training given task until the post-training given task completion providing a final output from the recurrent neural network, the input of the first neural network and the input to the recurrent neural network comprising different input nodes 2; and 



updating at least one first parameter based on the final output from the recurrent neural network provided with the input 3, the recurrent neural network including a plurality of second parameters 4,

wherein the given task is a post processing task and the input receiving by the input node of the first neural network and the input receiving by the input node of the recurrent neural network are unmodified identical values of multiple time frames 6, and wherein the first neural network and the recurrent neural network each comprise a single neural network 5.
1. (Currently amended) A computer program product including one or more computer readable storage mediums collectively storing program instructions that are executable by a computer to cause the computer to perform operations comprising: 

receiving an input by an input node of a first neural network, the first neural network including a plurality of first parameters, the input comprising a N-dimensional time-series input data vector, wherein N is an integer greater than 0 1, 3, 6; 

receiving the input by an input node of a recurrent neural network configured as a nonlinear extension of the first neural network to cooperatively process the input for a post-training given task until the post-training given task completion providing a final output from the recurrent neural network 2, the recurrent neural network including a plurality of second parameters 4, the input of the first neural network and the input to the recurrent neural network comprise different input nodes 2; and 

updating at least one first parameter of the plurality of first parameters based on the final output from the recurrent neural network provided with the input 3, 


wherein the given task is a post processing task, and wherein the input received by the input node of the first neural network and the input received by the input node of the recurrent neural network are identical, and wherein the first neural network and the recurrent neural network each comprise a single neural network 5.

* The superscripts are used for indicating corresponding subject matter between the instant application and the reference application. 

Instant application
Reference application
2. (Original) The method according to claim 1, 
wherein the at least one first parameter includes a bias parameter.
3. (Original) The computer program product according to claim 1, 
wherein the at least one first parameter includes a bias parameter.
3. (Original) The method according to claim 1, further comprising 

initializing the plurality of first parameters to zero.
4. (Original) The computer program product according to claim 1, wherein the operations further comprise 
initializing the plurality of first parameters to zero.
4. (Original) The method according to claim 1, further comprising 

estimating a mean of the current time JP920160178US2 (1708C)Page 27 of 29frame of the input using a conditional probability density of the input, wherein a current time frame of the input is assumed to have a Gaussian distribution.
5. (Original) The computer program product according to claim 1, wherein the operations further comprise 
estimating a means of the current time frame of the input using a conditional probability density of the input, wherein a current time frame of the input is assumed to have a Gaussian distribution.
5. (Original) The method according to claim 4, wherein 
the updating includes learning the first parameters, a standard deviation of the current time frame of the input, and a plurality of output weight values of the output from the recurrent neural network.
6. (Original) The computer program product according to claim 5, wherein 
the updating includes learning the first parameters, a standard deviation of the current time frame of the input, and a plurality of output weight values of the output from the recurrent neural network.
6. (Currently amended) The method according to claim 1, 
wherein the first neural network includes 

a plurality of layers of nodes among a plurality of nodes, each layer sequentially forwarding values of a time frame of the input, the plurality of layers of nodes including 

a first layer of a plurality of input nodes among the plurality of nodes, the input nodes receiving values of a current time frame of the input, and 

a plurality of intermediate layers, each node in each intermediate layer forwarding a value to a node in a subsequent or shared layer, and 

a plurality of weight values among the plurality of first parameters, each weight value to be applied to each value in the corresponding node to obtain a value propagating from a pre-synaptic node to a post-synaptic node.
2. (Previously presented) The computer program product according to claim 1, wherein the first neural network: 

a plurality of layers of nodes among a plurality of nodes, each layer sequentially forwarding values of a time frame of the input, the plurality of layers of nodes: 

Page 2 of 15a first layer of a plurality input nodes among the plurality of nodes, the input nodes receiving values of a current time frame of the input; 

a plurality of intermediate layers, each node in each intermediate layer forwarding a value to a node in a subsequent shared layer; and 


a plurality of weight values among the plurality of first parameters, each weight value to be applied to each value in the corresponding node to obtain a value propagating from a pre-synaptic node to a post-synaptic node.


Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claims 1 and 2 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Boulanger-Lewandowski et al. (“Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription”).

Regarding claim 1, 
Boulanger-Lewandowski teaches
A method comprising: 

receiving an input by an input node of a first neural network including a plurality of first parameters, the input comprising a N-dimensional time-series input data vector, wherein N is an integer greater than 0; 
(Boulanger-Lewandowski, [fig 2]; [sec 4] “The joint probability distribution of the RNN-RBM is also given by equation (7), but with hˆ(t) defined arbitrarily, here as per equation (11). … For simplicity, we consider the RBM parameters to be W, bv(t), bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation: 
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11)”; see also [secs 2-3] [sec Abs] “We investigate the problem of modeling symbolic sequences of polyphonic music in a completely general piano-roll representation. We introduce a probabilistic model based on distribution estimators conditioned on a recurrent neural network that is able to discover temporal dependencies in high-dimensional sequences.”; [sec 1, p. 1] “Many sequences of interest are over high-dimensional objects, such as images in video, short-term spectra in audio music, tuples of notes in musical scores, or words in text.”; “Restricted Boltzmann machine” reads on “first neural network”, and “W, bv(t), bh(t)” read on “a plurality of first parameters”. In addition, v(t) of eq (7) reads on “input”. Furthermore, e.g., a node which receives an “input” data may read on “input node”. In other words, e.g., the node of v(t) in fig 2(b) may read on “input node”. Moreover, e.g., “high-dimensional sequences” along with fig 2(b) may read on “N-dimensional time-series input data vector”.);

receiving the input by an input node of a recurrent neural network configured as a nonlinear extension of the first neural network to cooperatively process the input for a post-training given task until the post-training given task completion providing a final output from the recurrent neural network, the input of the first neural network and the input to the recurrent neural network comprising different input nodes; and 
(Boulanger-Lewandowski, [fig 2] “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units hˆ(t). The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)”; [secs 2-4] “input vector v(l) … The joint probability distribution of the RNN-RBM is also given by equation (7), but with hˆ(t) defined arbitrarily, here as per equation (11). … For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation:  
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) The RBM portion of the RNN-RBM (upper portion of Fig. 2(b)) is otherwise exactly the same as its RTRBM counterpart. This gives the single-layer RNN-RBM nine parameters: W, bv, bh, W’, W’’, hˆ(0), W2, W3, bhˆ.”; see also [sec 6] “probabilistic modeling of sequences of polyphonic music”; 
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) based on the input v(t) along with fig 2(b) reads on “receiving the input by an input node of a recurrent neural network”. In addition, “hˆ(0), W2, W3, bhˆ” read on “a plurality of second parameters”. Furthermore, “joint probability distribution of the RNN-RBM” reads on “nonlinear extension of the first neural network to cooperatively process the input”. Moreover, “
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
” reads on “final output from the recurrent neural network” since 
    PNG
    media_image3.png
    63
    78
    media_image3.png
    Greyscale
 is provided as a final output at each time step from the RNN for calculating biases for the RBM. Furthermore, e.g., testing may read on “post-training given task”. Besides, e.g., a node which receives an “input” data may read on “input node”. In other words, e.g., the node of v(t) in fig 2(b) may read on “input node”. Also, e.g., “input vector v(l)” along with “single-layer RNN (bottom portion of Fig. 2(b))” may read on “different input nodes” since the input of the first neural network and the input to the recurrent neural network are both mapped to v(t), which is a vector and thus comprises multiple input nodes, one for each element of the input vector v(t).)

updating at least one first parameter based on the final output from the recurrent neural network provided with the input, the recurrent neural network including a plurality of second parameters, 
(Boulanger-Lewandowski, [fig 2] “The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)”; [sec 4, p. 3 and p. 4 (left column)] “The RTRBM can be understood as a sequence of conditional RBMs whose parameters are the output of a deterministic RNN  … For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation:  
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) The RBM portion of the RNN-RBM (upper portion of Fig. 2(b)) is otherwise exactly the same as its RTRBM counterpart. This gives the single-layer RNN-RBM nine parameters: W, bv, bh, W’, W’’, hˆ(0), W2, W3, bhˆ.”; see also [sec 2, p. 2 (right column)]; “Restricted Boltzmann machine” reads on “first neural network”, and “W, bv(t) , bh(t)” read on “first parameter”. In addition, “The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)” of fig 2 and 
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 based on the input v(t) read on “updating at least one first parameter based on the final output from the recurrent neural network provided with the input”. Furthermore, “hˆ(0), W2, W3, bhˆ” read on “a plurality of second parameters”.)

wherein the given task is a post processing task and the input received by the input node of the first neural network and the input received by provided to input node of the recurrent neural network are unmodified identical values of multiple time frames, and 
wherein the first neural network and the recurrent neural network each comprise a single neural network.
(Boulanger-Lewandowski, [fig 2] “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN with hidden units hˆ(t) . The RBM biases bh(t), bv(t) are a linear function of hˆ(t−1)”; [secs 2-4] “The joint probability distribution of the RNN-RBM is also given by equation (7), but with hˆ(t) defined arbitrarily, here as per equation (11). … For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation:  
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11) The RBM portion of the RNN-RBM (upper portion of Fig. 2(b)) is otherwise exactly the same as its RTRBM counterpart. This gives the single-layer RNN-RBM nine parameters: W, bv, bh, W’, W’’, hˆ(0), W2, W3, bhˆ.”; see also [sec 6] “probabilistic modeling of sequences of polyphonic music … Although it is not strictly necessary, learning is facilitated if the sequences are transposed in a common tonality (e.g. C major/minor) as preprocessing.”; “learning is facilitated if the sequences are transposed in a common tonality (e.g. C major/minor) as preprocessing” reads on “given task is a post processing task” since preprocessing may be carried out before learning or testing. In addition, eq (1), eq (7) and eq (11) read on “unmodified identical values” based on the input, v(t), which is provided to a single and common input node of RNN-RBM (the two neural networks). Furthermore, the node of v(t) in fig 2(b) may read on “input node of the first neural network” and “input node of the recurrent neural network”. Furthermore, e.g., “The upper half of the RNN-RBM is the RBM stage while the lower half is a RNN” may read on “the first neural network and the recurrent neural network each comprise a single neural network”.)

Regarding claim 2, 
Boulanger-Lewandowski teaches
the at least one first parameter includes a bias parameter ([fig 2]; [sec 4, p. 3 and p. 4 (left column)] “For simplicity, we consider the RBM parameters to be W, bv(t) , bh(t) (i.e. only the biases are variable) and a single-layer RNN (bottom portion of Fig. 2(b)) whose hidden units hˆ(t) are only connected to their direct predecessor hˆ(t−1) and to v(t) by the relation: 
    PNG
    media_image1.png
    38
    386
    media_image1.png
    Greyscale
 (11)”; see also [sec 2, p. 2 (right column)]).	

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Boulanger-Lewandowski et al. (“Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription”) in view of Osogami et al. (“Seven neurons memorizing sequences of alphabetical images via spike-timing dependent plasticity”, hereinafter Osogami2015).

Regarding claim 3, 
Boulanger-Lewandowski teaches claim 1.

However, Boulanger-Lewandowski does not teach
initializing the plurality of first parameters to zero.

Osogami2015 teaches
initializing the plurality of first parameters to zero ([sec “Results”, p. 5] “Here, the values of the eligibility traces and the FIFO queues were reset to zero before a cue was presented.”).

Boulanger-Lewandowski and Osogami2015 are all in the same field of endeavor of processing input signal with the Boltzmann machine and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Boltzmann machine system of Boulanger-Lewandowski with the initialization of Osogami2015. Doing so would lead to presenting a sequence of zeros or a sequential pattern of a blank image to the DyBM for a sufficiently long period (Osogami2015, sec “Results”).

Claims 4-5 are rejected under 35 U.S.C. 103 as being unpatentable over Boulanger-Lewandowski et al. (“Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription”) in view of Ranzato et al. (“Modeling Natural Images Using Gated MRFs”).

Regarding claim 4, 
Boulanger-Lewandowski teaches claim 1.
However, Boulanger-Lewandowski does not teach 
estimating a mean of the current time frame of the input using a conditional probability density of the input, wherein a current time frame of the input is assumed to have a Gaussian distribution.

Ranzato teaches
estimating a mean of the current time frame of the input using a conditional probability density of the input, wherein a current time frame of the input is assumed to have a Gaussian distribution ([sec 2, p. 2209 (right column) and p. 2210 (left column)] “they contribute to control the mean of the conditional distribution over the input 
    PNG
    media_image6.png
    34
    563
    media_image6.png
    Greyscale
 
    PNG
    media_image7.png
    38
    354
    media_image7.png
    Greyscale
(7) where I is the identity matrix, WϵRDxM is a matrix of trainable parameters, and bx ϵ RD is a vector of trainable biases for the input variables.”; “control the mean of the conditional distribution over the input” and eq (7) read on “estimating a mean of the current time frame of the input using a conditional probability density of the input”.).

Boulanger-Lewandowski and Ranzato are all in the same field of endeavor of processing input signal with the Boltzmann machine and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Boltzmann machine system of Boulanger-Lewandowski with the mean estimation of Ranzato. Doing so would lead to enable the conditional distribution over the input pixels to be a Gaussian with not only its covariance but also its mean depending on the states of the latent variables (Ranzato, sec 2).

Regarding claim 5, 
Boulanger-Lewandowski and Ranzato teach claim 4.

Boulanger-Lewandowski further teaches
the updating includes learning the first parameters … and a plurality of output weight values of the output from the recurrent neural network ([fig 2]; [sec 3] “While all the parameters of the RBMs can depend on the previous time steps, we will consider the case where only the biases depend on hˆ(t−1): 
    PNG
    media_image8.png
    51
    294
    media_image8.png
    Greyscale
 (8) 
    PNG
    media_image9.png
    47
    300
    media_image9.png
    Greyscale
 (9); [sec 4.1] The hidden-to-bias weights W’, W’’ can then be initialized to small random values, such that the sequential model will initially behave like independent RBMs, eventually departing from that state.; [sec 4.2] The gradient then back-propagates through the hidden-to-bias parameters (eq. 8 and 9):”; “biases” read on “first parameters”. In addition, “hidden-to-bias weights W’, W’’” reads on “a plurality of output weight values of the output from the recurrent neural network”.).

Ranzato further teaches 
learning … a standard deviation of the current time frame of the input ([sec 2.1] “In this work, we extend these two classes of models with a new model whose conditional distribution over the input has both a mean and a covariance matrix determined by latent variables.”).

Boulanger-Lewandowski and Ranzato are all in the same field of endeavor of processing input signal with the Boltzmann machine and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Boltzmann machine system of Boulanger-Lewandowski and Ranzato with the standard deviation learning of Ranzato. Doing so would lead to enable the conditional distribution over the input pixels to be a Gaussian with not only its covariance but also its mean depending on the states of the latent variables (Ranzato, sec 2).


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Boulanger-Lewandowski et al. (“Modeling Temporal Dependencies in High-Dimensional Sequences: Application to Polyphonic Music Generation and Transcription”) in view of Osogami et al. (“Learning dynamic Boltzmann machines with spike-timing dependent plasticity”).

Regarding claim 6
Boulanger-Lewandowski teaches claim 1. 

However, Boulanger-Lewandowski does not teach
the first neural network includes 
a plurality of layers of nodes among a plurality of nodes, each layer sequentially forwarding values of a time frame of the input, the plurality of layers of nodes including 
a first layer of a plurality of input nodes among the plurality of nodes, the input nodes receiving values of a current time frame of the input, and 
a plurality of intermediate layers, each node in each intermediate layer forwarding a value to a node in a subsequent or shared layer, and 
a plurality of weight values among the plurality of first parameters, each weight value to be applied to each value in the corresponding node to obtain a value propagating from a pre- synaptic node to a post-synaptic node.

Osogami teaches
the first neural network includes 
a plurality of layers of nodes among a plurality of nodes, each layer sequentially forwarding values of a time frame of the input ([figs 4-5]; [sec 2.2] “Formally, we define the DyBM-T as the Boltzmann machine having T layers from −T + 1 to 0, where T is a positive integer or infinity. Let x ≡ (x[t])−T<t<=0, where x[t] is the values of the units in the t-th layer, which we consider as the values at time t.”; Fig 4 and fig 5 read “each layer sequentially forwarding values of a time frame of the input”. Note that Boulanger-Lewandowski teaches “input”. In addition, each circle of fig 4 reads on “nodes”.), the plurality of layers of nodes including 

a first layer of a plurality of input nodes among the plurality of nodes, the input nodes receiving values of a current time frame of the input ([figs 4-5]; [sec 2.2] “Formally, we define the DyBM-T as the Boltzmann machine having T layers from −T + 1 to 0, where T is a positive integer or infinity. Let x ≡ (x[t])−T<t<=0, where x[t] is the values of the units in the t-th layer, which we consider as the values at time t.”; The rightmost layer of Fig 4 reads “first layer”, and each node of the rightmost layer of fig 4 reads on “input nodes”. In addition, fig 4 and fig 5 read on “the input nodes receiving values of a current time frame of the input”. Note that Boulanger-Lewandowski teaches “input”.), and 

a plurality of intermediate layers, each node in each intermediate layer forwarding a value to a node in a subsequent or shared layer ([figs 4-5]; [sec 2.2] “Formally, we define the DyBM-T as the Boltzmann machine having T layers from −T + 1 to 0, where T is a positive integer or infinity. Let x ≡ (x[t])−T<t<=0, where x[t] is the values of the units in the t-th layer, which we consider as the values at time t.”; The other layers other than the rightmost layer of Fig 4 read “intermediate layers”. In addition, fig 4 and fig 5 read “each node in each intermediate layer forwarding a value to a node in a subsequent or shared layer”.), and

a plurality of weight values among the plurality of first parameters, each weight value to be applied to each value in the corresponding node to obtain a value propagating from a pre- synaptic node to a post-synaptic node ([figs 4-5] “Spikes traveling from a pre-synaptic neuron (i) to a post-synaptic neuron (j) and eligibility traces.”; [sec 2.2] “Formally, we define the DyBM-T as the Boltzmann machine having T layers from −T + 1 to 0, where T is a positive integer or infinity. Let x ≡ (x[t])−T<t<=0, where x[t] is the values of the units in the t-th layer, which we consider as the values at time t. … For δ ≥ 1, let W[δ] be the matrix whose (i, j) element, Wi,j[δ], denotes the weight between the i-th unit at time −δ and the j-th unit at time 0 for any t.”; “Wij[δ]” of fig 4 reads on “weight values”. Note that Boulanger-Lewandowski teaches “first parameters”. In addition, “Wi,j[δ], denotes the weight between the i-th unit at time −δ and the j-th unit at time 0 for any t” reads on “each weight value to be applied to each value in the corresponding node”.).

Boulanger-Lewandowski and Osogami are all in the same field of endeavor of processing input signal with the Boltzmann machine and are analogous. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the Boltzmann machine system of Boulanger-Lewandowski with the multiple layers of Osogami. Doing so would lead to significantly simplifying the learning rule for the DyBM (Dynamic Boltzmann machine) and exhibiting various characteristics of STDP that have been observed in biological neural networks when the DyBM has an infinite number of layers and particularly structured parameters (Osogami, sec 1).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SEHWAN KIM whose telephone number is (571)270-7409.  The examiner can normally be reached on Mon - Thu 7:00 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael J Huntley can be reached on (303) 297-4307.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.K./Examiner, Art Unit 2129       


/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129