DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION. —The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

	Claims 1, 11, and 17 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
	Claims 1, 11, and 17 recites the limitation "the highest occupation measures".  There is insufficient antecedent basis for this limitation in the claims. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.




Step 1: This part of the eligibility analysis evaluates whether the claim falls within any statutory category. MPEP 2106.03:
Claims 1 and 6 recite a method/process, 8 and 13 recite a system, and claims 15 and 20 recite a computer program product; therefore, they fall into one of the statutory categories of invention.

Analysis of Claim 1
Step 2A Prong One: This part of the eligibility analysis evaluates whether the claim recites a judicial exception. As explained in MPEP 2106.04(II) and the 2019 PEG, a claim “recites” a judicial exception when the judicial exception is “set forth” or “described” in the claim. 
The claim recites a judicial exception (i.e., an abstract idea) without significantly more. For example, applicant claim limitations under broadest reasonable interpretation covers activities classified under mathematical concept. Abstract ideas classified under mathematical concepts include mathematical relationships, mathematical formulas or equations, and mathematical calculations, see MPEP 2106.04(a)(2), as highlighted in the claim analysis below.
The claim recites, inter alia: 
automatically identifying features that drive a reinforcement learning model to recommend an action of interest (mental process)

Step 2A Prong Two: This part of the eligibility analysis evaluates whether the claim recite additional elements that integrate the judicial exception into a practical application. 
Based on the determination in Step 2A Prong One of the analysis that the claims are directed to a judicial exception, it must be determined if the claim contains any element of combination of elements sufficient to ensure that the claim amounts to significantly more than the judicial exception.
The claim recites, inter alia: 
operating at least one hardware processor (mere instructions to apply the exception using a generic computer component), see MPEP 2106.05(f)
said identifying is based on occupation measures of state-action pairs associated with the reinforcement learning model (mere instructions), see MPEP 2106.05(f)
In this case, after considering those additional elements individually and in combination, it is determined that those additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
Step 2B: This part of the eligibility analysis evaluates whether the claim as a whole amount to significantly more than the recited exception, i.e., whether any additional 
As discussed above with respect to integration of the abstract idea into practical application, the additional elements below do not add significantly more to the exception when considered separately and in combination.
The claim recites, inter alia: 
operating at least one hardware processor (mere instructions to apply the exception using a generic computer component), see MPEP 2106.05(f)
said identifying is based on occupation measures of state-action pairs associated with the reinforcement learning model (mere instructions), see MPEP 2106.05(f)
Those additional elements do not amount to an inventive concept to the claim because they are mere instructions or insignificant extra-solution activity. 
In Summary, the claim recites abstract idea without being integrated into a practical application, and does not provide additional elements that would amount to significantly more. As such, taken as a whole, the claim is ineligible under the 35 USC 101.
	
	Similarly, claims 8 and 15 is/are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.

Analysis of Claim 6

	
	Similarly, claims 13 and 20 is/are rejected under 35 U.S.C. 101, mutatis mutandis, as reciting an abstract idea without adding significantly more than the judicial exception.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claim(s) 1-3, 6-10, 13-17, and 20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Claessens et al. (US 20190019080 A1, hereinafter Claessens) in view of Tsitkin et al. (US 9836697 B2, hereinafter Tsitkin).

Regarding claim 1,
	Claessens teaches: A method comprising: operating at least one hardware processor for automatically identifying features that drive a reinforcement learning model to recommend an action of interest ([0215] e.g., “The implementation of any of the methods of the present invention can be performed by logic circuits, electronic hardware, processors or circuitry” [0031-0035]: extracting/identifying features that drive Reinforcement Learning (RL) to recommend an action. 
 [0031] e.g., “Reinforcement Learning (RL)” [0034] e.g., “inputting at least the extracted local convolutional features to a first neural network, the first neural network outputting at least an approximation of a state-action value function which provides values for the at least one cluster associated with each combination of the at least one cluster being in a state and taking an action” [0036] e.g., “Limiting the first fully connected neural network to be a second neural network that takes over a ready processed output of the convolutional neural network also reduces the computational intensity required and also shortens training times.” Examiner notes that [0034] the extracted features are automatically identified from [0034] [0036] a ready processed output of the convolution neural network.),
 	wherein said identifying is based on [occupation] measures of state-action pairs associated with the reinforcement learning model ([0031-0035]: the extracting/ identifying features is based on an approximation of a state-action value function associated with Reinforcement Learning (RL).  [0031] e.g., “Reinforcement Learning (RL)” [0034] e.g., “inputting at least the extracted local convolutional features to a first neural network, the first neural network outputting at least an approximation of a state-action value function which provides values for the at least one cluster associated with each combination of the at least one cluster being in a state and taking an action”).
	Claessens does not explicitly teach: occupation measures of state-action pairs
	However, Tsitkin teaches: occupation measures of state-action pairs ([Col. 7 ln. 18-20] e.g., “An occupation measure of each state and action pair is upper bounded by one according to the definition”).
 In view of the teachings of Tsitkin it would have been obvious for a person of ordinary skill in the art to apply the teachings of Tsitkin to Claessens before the effective filing date of the claimed invention in order to find an operating policy which will minimize the cost (cf. Tsitkin [Col. 7 Ln. 67 - Col. 8 ln. 3] e.g., “The cost as a result of operating such a plant may include the costs of electricity, sludge, gas, chemicals and the like. The problem may be to find an operating policy which will minimize the cost.”).
 
Regarding claim 2,
	Claessens in view of Tsitkin teaches: The method according to claim 1.
	Claessens teaches: further comprising operating the at least one hardware processor ([0215] e.g., “processors”) for: fitting the reinforcement learning model, to generate a policy ([0031] e.g., “Reinforcement Learning (RL)” [0163] e.g., “constructing a policy h”); 
	wherein said identifying comprises identifying the features from the states of the selected state-action pairs ([0061] e.g., “Preferably, before merging exogenous state information and the control action with the extracted convolutional local features of the convolutional neural network, a separate feature extraction is performed, wherein the exogenous state information and the control action is first fed into the second neural network, this second neural network mapping the exogenous state information and the control action into a learnt internal representation that is combined with the extracted convolutional local features in a next hidden layer.” [0065] e.g., “Preferably the features that are learnt represent changes in state values that occur over multiple time steps.”).
	Claessens does not explicitly teach: based on the policy, calculating probabilities of the state-action pairs; 
	based on the probabilities, calculating the occupation measures for the state-action pairs; 
	receiving a selection of the action of interest; and 
	selecting those of the state-action pairs which: comprise the action of interest, and have occupation measures that comply with a predefined threshold; 
	However, Tsitkin teaches: based on the policy, calculating probabilities of the state-action pairs ([Col. 1 ln. 29-31 and 40-43] e.g., “There is provided, in accordance with an embodiment, a method for determining a variable near-optimal policy… (c) a transition probabilities matrix determining transition probabilities between states of the finite set of states, once actions of the set of actions are performed;”); 
([Col. 7 ln. 18-20] e.g., “An occupation measure of each state and action pair is upper bounded by one according to the definition”); 
	receiving a selection of the action of interest; and selecting those of the state-action pairs which: comprise the action of interest, and have occupation measures that comply with a predefined threshold ([Col. 3 ln. 43-46] e.g., “updating the near-optimal policy during its execution, such as limitation of number of changes of the value of certain action entries during the entire process (i.e., limited action entries).” [Col. 7 ln. 16-20] e.g., “This is from the reason that maximum value that may be obtained for any time is maxs,u c(s, u). An occupation measure of each state and action pair is upper bounded by one according to the definition” [Col. 4 ln. 4-8] e.g., “This, in addition to satisfaction of the requirement of specific (i.e., predefined) number of changes of the value of one or more action entries during the process.” Examiner notes that a selection of the action of interest is received by limiting the number of changes of the value of certain action entries, which is based on the calculation of the state-action pairs with associated to the received near-optimal policy. Examiner further notes that selecting state-action pairs that have an occupation measure of each state and action pair that satisfies the requirement of specific (i.e., predefined) number.); 
The motivation to combine Claessens with Tsitkin is the same rationale as set forth above with respect to claim 1.

Regarding claim 3,

	Claessens does not explicitly teach: wherein the predefined threshold is a predefined number of state-action pairs which have the highest occupation measures.
	However, Tsitkin teaches: wherein the predefined threshold is a predefined number of state-action pairs which have the highest occupation measures ([Col. 3 ln. 43-46] e.g., “updating the near-optimal policy during its execution, such as limitation of number of changes of the value of certain action entries during the entire process (i.e., limited action entries).” [Col. 7 ln. 16-20] e.g., “This is from the reason that maximum value that may be obtained for any time is maxs,u c(s, u). An occupation measure of each state and action pair is upper bounded by one according to the definition” [Col. 4 ln. 4-8] e.g., “This, in addition to satisfaction of the requirement of specific (i.e., predefined) number of changes of the value of one or more action entries during the process.” Examiner notes that the predefined threshold is the limitation of number of changes of the value of certain action entries of state-action pairs which have the maximum value of the occupation measures.).
The motivation to combine Claessens with Tsitkin is the same rationale as set forth above with respect to claim 1.

Regarding claim 6,
	Claessens in view of Tsitkin teaches: The method according to claim 1.
	Claessens further teaches: wherein the reinforcement learning model is a deep reinforcement learning model (Fig. 1 and [0076] e.g., “Embodiments of the present invention use deep approximation architectures” [0077] e.g., “Embodiments of the present invention provide a model-free control technique mainly in the form of Reinforcement Learning (RL)” 

    PNG
    media_image1.png
    415
    631
    media_image1.png
    Greyscale
).

Regarding claim 7,
	Claessens in view of Tsitkin teaches: The method according to claim 1.
	Claessens teach: further comprising operating the at least one hardware processor for: issuing an indication of the identified features ([0040] e.g., “The control action and exogenous state information is preferably input to a second neural network which is connected as an input to the first neural network. The method also can include merging exogenous state information and the control action with the extracted convolutional local features of the convolutional neural network.”), 
	based on the indication, performing at least one of: (a) an action to affect a physical system in which the reinforcement learning model operates, and (b) an ([0040] e.g., “The method also can include merging exogenous state information and the control action with the extracted convolutional local features of the convolutional neural network. This is advantageous because not only does the exogenous information include relevant values such as an outside temperature and time of day, but it also includes the control action. This allows the controller to learn an approximation of a Q function as an output.” [0031] e.g., “determining the amount of the physical product to be distributed to the constrained cluster elements during a next control step using a control technique in the form of Reinforcement Learning (RL)”).

Regarding claim 8, 
	Claessens teaches: A system comprising: (a) at least one hardware processor; and (b) a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by said at least one hardware processor ([0075] e.g., “The present invention also provides a computer program product comprising code which when executed on a processing engine is adapted to carry out any of the methods of the invention. A non-transitory machine readable signal storage means can store the computer program product.” [0250] e.g., “Any of the above software may be implemented as a computer program product which has been compiled for a processing engine in any of the servers or nodes of the network. The computer program product may be stored on a non-transitory signal storage medium”) to: claim 1, and is similarly analyzed. 

Regarding claim 9, 
	 the claim recites the system of claim 2, and is similarly analyzed.

Regarding claim 10, 
	the claim recites the system of claim 3, and is similarly analyzed.

Regarding claim 13, 
	the claim recites the system of claim 6, and is similarly analyzed.

Regarding claim 14, 
	the claim recites the system of claim 7, and is similarly analyzed.

Regarding claim 15, 
	Claessens teaches: A computer program product comprising a non-transitory computer-readable storage medium having program code embodied therewith, the program code executable by at least one hardware processor ([0250] e.g., “Any of the above software may be implemented as a computer program product which has been compiled for a processing engine in any of the servers or nodes of the network. The computer program product may be stored on a non-transitory signal storage medium”) to: claim 1, and is similarly analyzed.

Regarding claim 16, 
	the claim recites the computer program product of claim 2, and is similarly analyzed.

Regarding claim 17, 
	the claim recites the computer program product of claim 3, and is similarly analyzed.

Regarding claim 20, 
	the claim recites the computer program product of claim 6, and is similarly analyzed.

Claim(s) 4-5, 11-12, and 18-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Claessens in view of Tsitkin, further in view of Xu et al. (US 20180365975 A1, hereinafter Xu).

Regarding claim 4,
	Claessens in view of Tsitkin teaches: The method according to claim 2.
	Claessens further teaches: wherein: each of the states of the state-action pairs comprises a feature vector; the identified features are from the feature vectors of the states of the selected state- action pairs ([0078] e.g., “Embodiments of the present invention remedy this by adding a full information vector in the input of the state comprising not just the state value for a current time “t” as measured directly but also the previously measured states, at times “t−2T”, “t−T”, . . . t” etc.” Examiner notes that the full information vector contains the “feature vectors” of the identified features from the states selected.).
	Claessens in view of Tsitkin does not explicitly teach: the method further comprises operating the at least one hardware processor for reducing dimensionality of the feature vectors of the states of the selected state- action pairs according to a desired dimensionality level, such that the identified features are the most substantial features out of the feature vectors of the states of the selected state-action pairs.
	However, Xu teaches: the method further comprises operating the at least one hardware processor for reducing dimensionality of the feature vectors of the states of the selected state-action pairs according to a desired dimensionality level, such that the identified features are the most substantial features out of the feature vectors of the states of the selected state-action pairs ([0187] e.g., “For each sample channel state information collected from an unknown event, it first goes through the same preprocessing adopted in the training phase to reduce or eliminate the phase offset and the initial phase distortion and then a feature vector is generated by concatenating the real and the imaginary parts of the channel state information. Afterwards, the learned principal component analysis orthogonal transformation is applied to the feature vector to reduce the number of dimensions. The new feature vector is classified and labeled using the trained linear support vector machine.” Examiner notes that “state-action pairs” is taught by Claessens [0034]. Examiner further notes that Xu teaches reducing dimensionality of the feature vectors of the preprocessed channel states of the sample channel state information collected such that the new identified features are the principal component/the most substantial features.).
	In view of the teachings of Xu it would have been obvious for a person of ordinary skill in the art to apply the teachings of Xu to Claessens before the effective filing date of the claimed invention in order to reduce the dimension of features (cf. Xu [0184] e.g., “the first 200 largest components after transformation are selected as the new feature, and thus the dimension of features is been reduced.”).

Regarding claim 5,
	Claessens in view of Tsitkin and Weldemariam teaches: The method according to claim 4.
	Claessens in view of Tsitkin does not explicitly teach: wherein said reduction of dimensionality comprises performing principal component analysis (PCA) to identify a number of principal components which corresponds to the desired dimensionality level.
	However, Xu teaches: wherein said reduction of dimensionality comprises performing principal component analysis (PCA) to identify a number of principal components which corresponds to the desired dimensionality level ([0184] e.g., “The principal component analysis is learned using the newly generated feature vectors from the previous step for all the events. In some implementations, the first 200 largest components after transformation are selected as the new feature, and thus the dimension of features is been reduced.”).


Regarding claim 11, 
	the claim recites the system of claim 4, and is similarly analyzed.

Regarding claim 12, 
	the claim recites the system of claim 5, and is similarly analyzed.

Regarding claim 18, 
	the claim recites the computer program product of claim 4, and is similarly analyzed.

Regarding claim 19, 
	the claim recites the computer program product of claim 5, and is similarly analyzed.

Prior Art
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure are listed below:
Tunyasuvunakool et al. (US 20190126472 A1): teaches reinforcement and imitation learning for a task based on rewards calculated from control data output by the control system.
Dorai et al. (US 20150019458 A1): teaches an optimal multi-stage asset management policy providing a plurality of state transition probabilities between states.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAEYONG J PARK whose telephone number is (571) 272-3898. The examiner can normally be reached on M-F 9:00 a.m. - 6:00 p.m.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Michael Huntley can be reached at (303) 297-4307. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO 

/JAEYONG J PARK/Examiner, Art Unit 2129
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129