DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The applicant’s submission of the Information Disclosure Statement(s) (IDS), received 08/04/2017, in compliance with 37 CFR 1.97 and 37 CFR 1.98 is acknowledged by the examiner. The examiner has considered the cited references in examination of the application and attached signed and dated copies to the Office action.	
	
	Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:

(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 
“obtaining section configured to obtain an action and observation sequence including a plurality of time frames, each time frame including action values and observation values” in claim 19;
“input section configured to: input at least some of the observation values of each time frame of the action and observation sequence sequentially into a first neural network including a plurality of first parameters; and input the action values of each time frame of the action and observation sequence and output values from the first neural network corresponding to the at least some of the observation values of each time frame of the action and observation sequence sequentially into a second neural network including a plurality of second parameters” in claim 19; 
“approximating section configured to approximate an action-value function using the second neural network” in claim 19; and
“updating section configured to update the plurality of first parameters using backpropagation” in claim 19.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.


Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –



Claim(s) 1-7 and 9-20 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Claessens et al., U.S. Patent Application Publication 2019/0019080 (hereinafter Claessens)

Regarding claims 1, 17, and 19, taking claim 1 as exemplary, Claessens teaches a computer program product comprising a computer readable storage medium having program instructions embodied therewith, wherein the computer readable storage medium is not a transitory signal per se, the program instructions executable by a processor to cause the processor to perform a method comprising: 
obtaining an action and observation sequence including a plurality of time frames, each time frame including action values and observation values [The neural network architecture 20 receives grids 12 (i.e. observations) and action u. Claessens at paragraph 183; See also paragraph 103; FIGS. 1 & 4]; 
inputting at least some of the observation values of each time frame of the action and observation sequence sequentially into a first neural network including a plurality of first parameters [The observations are input into convolutional neural network 14 (i.e. first neural network), which has learnable weights (i.e. parameters). Claessens at paragraphs 87, 183; FIGS. 1 & 4]; 
inputting the action values of each time frame of the action and observation sequence and output values from the first neural network corresponding to the at least some of the observation values of each time frame of the action and observation sequence sequentially into a second neural network including a plurality of second parameters [The actions u (after being mapped into an intermediate representation) and the output from neural network 14 are input to neural network 15 (i.e. second neural network). Claessens at paragraph 183; FIGS. 1 & 4]; 
approximating an action-value function using the second neural network [The neural network architecture 20, using neural network 15 (i.e. second neural network), approximates Q*, an action value function. Claessens at paragraphs 86, 183; FIGS. 1 & 4]; and 
updating the plurality of first parameters using backpropagation [Neural network architecture 20, which includes neural network 14 (i.e. the first neural network), is trained (i.e. parameters are updated) using the rmsprop algorithm (i.e. backpropagation). Claessens at paragraphs 187].

Regarding claims 2, 18, and 20, taking claim 2 as exemplary, Claessens teaches the computer program product according to claim 1, wherein the updating of the plurality of first parameters uses an error based on the approximated action-value function and a reward [Gradient descent is used for updating parameters (Claessens at paragraph 140-141) which uses the error based on the output (i.e. approximated action value function and reward)].

Regarding claim 3, Claessens teaches the computer program product of claim 2, wherein the updating of the plurality of first parameters is based on backpropagation of a gradient of the plurality of first parameters with respect to an error generated by the second neural network [Gradient descent is used for updating parameters, which uses backpropagation of a gradient of parameters with respect to errors generated by neural network 15. See Claessens at paragraph 140-141].

Regarding claim 4, Claessens teaches the computer program product of claim 2, wherein approximating the action-value function further comprises: determining a current action-value from an evaluation of the action-value function in consideration of an actual reward [The observation/state include the actual reward. Claessens at paragraph 169-170]; and caching a previous action-value determined for a previous time frame from the action-value function [All observations are used for subsequent time steps. See Claessens at paragraphs 103, 180; FIGS. 1 & 4].

Regarding claim 5, Claessens teaches the computer program product of claim 4, wherein the action-value function is determined with respect to nodes of the second neural network associated with actions of the action and observation sequence [The neural network 15 (i.e. second neural network), determines the approximates Q*, an action value function. Claessens at paragraphs 86, 183; FIGS. 1 & 4].

Regarding claim 6, Claessens teaches the computer program product of claim 5, wherein the error is a temporal difference error [It is temporal difference learning and, therefore, the error is a temporal difference error. Claessens at paragraph 136]; and wherein approximating the action-value function further comprises calculating the temporal difference error based on the previous action-value, the current action-value, and the plurality of second parameters [Temporal difference learning is used and, therefore, the error is a temporal difference error that is calculated based on the previous action-value, current action-value, and the neural network parameters (e.g. weights). See Claessens at paragraph 135-139].

Regarding claim 7, Claessens teaches the computer program product of claim 6, wherein approximating the action-value function further comprises updating the plurality of second parameters based on the temporal difference error and a learning rate [Temporal difference learning is used and, therefore, the learning/training of the parameters in based on the temporal difference error and a learning rate. See Claessens at paragraphs 135-139].

Regarding claim 9, Claessens teaches the computer program product of claim 1, wherein the action-value function is an energy function of the second neural network [The state-action value function Q* is an energy function. See Claessnes at paragraph 178].

Regarding claim 10, Claessens teaches the computer program product of claim 1, wherein the action-value function is a linear function [The state-action value function Q* is compute by linear output layer 19 and, therefore, Q* is a linear function. Claessnes at paragraph 183].

Regarding claim 11, Claessens teaches the computer program product according to claim 1, wherein inputting to the second neural network further comprises inputting remaining observation values into the second neural network [All observations (i.e. including remaining ones) are input into neural network 15 (i.e. second neural network) through convolutional neural network 14 (i.e. first neural network). See Claessens at paragraphs 183; FIGS. 1 & 4].

Regarding claim 12, Claessens teaches the computer program product of claim 1, wherein the second neural network comprises: 
[Neural network 15 comprises multiple layers of nodes forwarding inputs to a subsequent layer. See Claessens at paragraph 183; FIGS. 1 & 4], the plurality of layers of nodes comprising: 
an input layer including the plurality of input nodes among the plurality of nodes, the input nodes receiving input values representing an action and an observation of a current time frame of the action and observation sequence [Neural network 15 comprises an input layer receiving action and observations for a time step. See Claessens at paragraphs 183 and 187; FIGS. 1 & 4]; and 
a plurality of intermediate layers, each node in each intermediate layer forwarding a value representing an action or an observation to a node in a subsequent or shared layer [Neural network 15 comprises two intermediate layers with nodes forwarding action or observation values to a subsequent layer. See Claessens at paragraphs 89, 183, and 187; FIGS. 1 & 4]; and 
a plurality of weight values among the plurality of second parameters of the second neural network, each weight value to be applied to each value in a corresponding node to obtain a value propagating from a pre-synaptic node to a post-synaptic node [The neural networks, including neural network 15, apply weights (i.e. the second parameters) to input values from one node to another (i.e. pre-synaptic node to post-synaptic node). See Claessens at paragraph 213].

Regarding claim 13, Claessens teaches the computer program product of claim 1, wherein obtaining an action and observation sequence further comprises: 
selecting an action, using the second neural network, with which to proceed from a current time frame of the action and observation sequence to a subsequent time frame of the action and observation sequence [An action is obtained/selected based on the computed action value function Q*, which is computed by neural network 15 (i.e. second neural network).  Claessens at paragraphs 179, 183.]; causing the selected action to be performed [The obtained/selected action is performed. Claessens at paragraph 184]; and obtaining an observation of the subsequent time frame of the action and observation [All observations are used for subsequent time steps. See Claessens at paragraphs 103, 180; FIGS. 1 & 4].

Regarding claim 14, Claessens teaches the computer program product of claim 13, wherein the observation obtained includes an actual reward [The observation/state include the actual reward. Claessens at paragraph 169-170].

Regarding claim 15, Claessens teaches the computer program product of claim 13, wherein selecting an action includes evaluating each reward probability of a plurality of possible actions according to a probability function based on the action-value function [Selected/obtained action is based on a probability function (Claessens at paragraph 181) based on the action value function. Claessens at paragraphs 179-180]; and wherein the selected action among the plurality of possible actions yields a largest reward probability from the probability function [The selected/obtained action yields the largest reward probability of the function. See Claessens at paragraph 181].

Regarding claim 16, Claessens teaches the computer program product according to claim 1, wherein the first neural network comprises: 
a plurality of layers of nodes among a plurality of nodes, each layer forwarding input values to a subsequent layer among the plurality of layers [Neural network 14 is a convolutional neural network with multiple layers, each having nodes. Claessens at paragraphs 87-88, 183, 187; FIG. 1 & 4], the plurality of layers of nodes comprising: 
an input layer including the plurality of input nodes among the plurality of nodes, the input nodes receiving input values representing an observation of a current time frame of the action and observation sequence [Neural network 14 has an input layer that receives the observations of a time step. Claessens at paragraph 103, 187; FIG. 1 & 4]; 
at least one intermediate layer, each node in the at least one intermediate layer forwarding a value representing an observation to a node in a subsequent layer; and an output layer, each node in the output layer converting a value representing an observation to a value between 0 and 1 [Neural network 14 is a convolutional neural network, which has multiple internal/intermediate layers and an output layer that outputs class scores/feature map (i.e. value between 0 and 1). Claessens at paragraphs 88, 90-91, and 187; FIG. 1 & 4]; and 
a plurality of weight values among the plurality of first parameters of the first neural network, each weight value to be applied to each value in a corresponding node to obtain a value propagating from a pre-synaptic node to a post-synaptic node [The neural networks, including neural network 14, apply weights (i.e. the second parameters) to input values from one node to another (i.e. pre-synaptic node to post-synaptic node). See Claessens at paragraph 213].

Allowable Subject Matter
Claim 8 is objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter: the prior art of record doesn’t teach or suggest “wherein updating the plurality of second parameters further comprises updating a plurality of eligibility traces and a plurality of first-in-first-out (FIFO) queues” as recited in claim 8.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Hausknecht and Stone, “Deep Recurrent Q-Learning for Partially Observable MDPs”, teaches an architecture for partially observable MDPs that combines a Long Short-Term Memory with a Deep Q-Network.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BENJAMIN P GEIB whose telephone number is (571)272-8628.  The examiner can normally be reached on Monday - Friday 8:30 AM - 5:00 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/BENJAMIN P GEIB/Primary Examiner, Art Unit 2123