Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/15/2018 was filed before the mailing date of the first office action. The submission is in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statement is being considered by the examiner.
Specification
The disclosure is objected to because of the following informalities: in figure 3 block 360 and specification paragraph [0054] the term “e” in {s, a, r, s’, e} is not defined. Appropriate correction is required.
Claim Objections
Claim 1 objected to because of the following informalities:  The phrase “another experience having another reward that less than or equal to the first threshold” should be corrected to “another experience having another reward that is less than or equal to the first threshold”.  Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to 

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 1-19 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claims contain subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention.
The 112(a) rejection is based on the specification not disclosing sufficient details on how the critical events and the corresponding threshold are defined.  Claim 1 states that an experience with a corresponding reward that exceeds a threshold should be stored in the experience buffer, and that an experience with a corresponding reward that less than or equal to the threshold should search for a similar experience in the experience buffer that has a reward exceeding the threshold and copy that experience into the event buffer. Claim 9 states that the threshold is used to identify critical events, and that any of the plurality of experiences unrelated to the critical events (i.e. wherein the value of the reward exceeds the threshold) are stored in the experience buffer and any of the plurality of experiences related to the critical events (i.e. wherein the value of the reward does not exceed the threshold) are stored in the 
Looking to the specification, the Examiner finds that paragraphs [0019] and [0045] contradict each other in a similar way. Paragraph [0019] states that “the term "critical events" refer to respective sets of steps that directly or indirectly result in a reward that is well below (e.g., by a threshold amount) an average award amount. In an embodiment, steps that result in the agent dying can be considered critical events”. However, paragraph [0045] states that “The condition of the agent dying can be considered a first threshold on the reward for the given experience, where if the agent has died, then the first threshold is considered to be exceeded”. A reward that is well below a threshold amount cannot be considered to have exceeded a threshold. 
Using the guideline published in the federal register on Examining Computer-Implemented Functional Claim Limitations for Compliance With 35 U.S.C. 112, which states that: “Even if a claim is not construed as a means-plus-function limitation under 35 U.S.C. 112(f), computer-implemented functional claim language must still be evaluated for sufficient disclosure under the written description and enablement requirements of 35 U.S.C. 112(a). . . . a specification must describe the claimed invention in sufficient detail (e.g., by disclosure of an algorithm) to establish that the applicant had possession of the claimed invention as of the application filing date.” It cannot be determined whether a reward that exceeds a threshold or falls below a threshold results in a critical event, therefore the applicant does not have 
Dependent claims 2-4, 6-8, and 10-11 are also rejected because they fail to correct the deficiencies of independent claim 1 on which they depend.
Dependent claims 13-15, and 17-19 are also rejected because they fail to correct the deficiencies of independent claim 12 on which they depend.

Claim 10 is rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement.  The claim contains subject matter which was not described in the specification in such a way as to enable one skilled in the art to which it pertains, or with which it is most nearly connected, to make and/or use the invention. The written description does not disclose how the reinforcement learning model is trained, nor does it describe how to probe the reinforcement learning model during exploitation to learn a reason why a particular action was performed. For purposes of prior art examination, the Examiner interprets that claim 1 describes the training phase for the reinforcement learning model using the plurality of experiences. Examiner also interprets that learning why a particular action was performed means referring to a set of experiences and comparing their respective 

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 4, 13, and 15 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
Claim 2 recites the limitation "said storing step".  There is insufficient antecedent basis for this limitation in the claim. Claim 1 on which claim 2 depends recites “storing the given experience” and “copying the candidate experience” but does not have a specific “storing step” for “the experience”.  It is unclear whether claim 2 refers to “the given experience” from claim 1 or “another experience” from claim 1 during “the storing step”. 
For purposes of prior art examination, Examiner is interpreting that claim 2 refers to paragraph [0049] from the specification and block 335 from figure 3, wherein after storing the given experience from claim 1 in the experience buffer, the event buffer is searched for a 
Claim 13 is a non-transitory medium claim corresponding to claim 2 and is rejected for the same reasons. 
Claim 4 recites the limitation "performing random exploration responsive to said stopping step".  There is insufficient antecedent basis for this limitation in the claim. Claim 1 on which claim 4 depends does not have stopping conditions or a “stopping step”. For purposes of prior art examination, Examiner interprets that the “stopping step” in claim 4 refers to “stopping the selecting of the action from the event buffer after a pre-defined number of steps during a training stage of the reinforcement learning” from claim 3.
Claim 15 is a non-transitory medium claim corresponding to claim 4 and is rejected for the same reasons. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1-20 are rejected under 35 U.S.C. 101. Claims 1-11 are directed to a method, claims 12-19 are directed to a non-transitory computer-readable medium, and claim 20 is directed to a system; therefore, claims 1-20 fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter). However, claims 1-20 fall within the judicial exception of an abstract idea, specifically the abstract ideas of “Mental Processes” 
	Claim 1:
Step 1: Claim 1 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 1 recites the following abstract ideas:
A computer-implemented method for reinforcement learning performed by a processor, the method comprising:
obtaining, from an environment, a given experience that includes an action, a state and a reward (mental process directed to observation);
responsive to obtaining another experience having another reward that less than or equal to the first threshold, searching the experience buffer for a candidate experience with a similar state to the other experience (mental process directed to observation, evaluation); and
during exploration, selecting an action to be taken to the environment from the event buffer with a predetermined probability (mental process directed to evaluation).
Step 2A, Prong 2: Claim 1 recites the following additional elements:
a processor, storing the given experience in an experience buffer responsive to a value of the reward included in the given experience exceeding a first threshold, and copying the candidate experience into an event buffer. These are interpreted as generic computer components and transmitting and receiving data, which do not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 1 recites the following additional elements:


Claim 12 is a non-transitory computer readable storage medium claim and its limitation is included in claim 1. Claim 12 is rejected for the same reasons as claim 1. Claim 12 recites the following additional elements: a computer program product and program instructions executable by a computer having the processor to cause the computer to perform a method. These are interpreted as generic computer components, which do not amount to significantly more and not integrate the abstract idea into a practical application (see MPEP 2106.05(f)).
	Claim 20 is a system claim and its limitation is included in claim 1. Claim 20 is rejected for the same reasons as claim 1. Claim 20 recites the following additional elements: a computer processing system, a memory for storing program code, and a processor, operatively coupled to the memory, for running the program code. These are interpreted as generic computer components, which do not amount to significantly more and not integrate the abstract idea into a practical application (see MPEP 2106.05(f)).
	The independent claims are not patent eligible.

Dependent claims 2-11 and 13-19, when analyzed as a whole, are held to be patent ineligible under 35 U.S.C. 101 because the additional recited limitations fail to establish that the 
Claim 2:
Step 1: Claim 2 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 2 recites the following abstract ideas:
searching the event buffer for a similar experience having a same state as the experience (mental process directed to observation, evaluation).
Step 2A, Prong 2: Claim 2 recites the following additional elements:
storing the similar experience into the event buffer responsive to any of the action and the reward of the similar experience being different from those of the experience. These are interpreted as transmitting and receiving data, which does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 2 recites the following additional elements:
storing the similar experience into the event buffer responsive to any of the action and the reward of the similar experience being different from those of the experience. These are interpreted as transmitting and receiving data, which does not amount to significantly more (see MPEP 2106.05(d)(II)).
Claim 3:
Step 1: Claim 3 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 3 recites the following abstract ideas:

Step 2A, Prong 2: Claim 3 does not recite any additional elements and therefore does
not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 3 does not recite any additional elements and therefore does
not amount to significantly more. 
Claim 4:
Step 1: Claim 4 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 4 recites the following abstract ideas:
performing random exploration responsive to said stopping step (mental process directed to evaluation).
Step 2A, Prong 2: Claim 4 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 4 does not recite any additional elements and therefore does not amount to significantly more.
Claim 5:
Step 1: Claim 5 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 5 recites the following abstract ideas:

Step 2A, Prong 2: Claim 5 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 5 does not recite any additional elements and therefore does not amount to significantly more.
	Claim 6:
Step 1: Claim 6 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 6 recites the abstract ideas from claim 1 on which it depends.
	
Step 2A, Prong 2: Claim 6 recites the following additional elements:
storing in the experience buffer any experiences previously observed except for the experiences that resulted in a corresponding reward that fails to exceed the first threshold. These are interpreted as transmitting and receiving data, which does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 6 recites the following additional elements:
storing in the experience buffer any experiences previously observed except for the experiences that resulted in a corresponding reward that fails to exceed the first threshold. These are interpreted as transmitting and receiving data, which does not amount to significantly more (see MPEP 2106.05(d)(II)).
	Claim 7:

Step 2A, Prong 1: Claim 7 recites the following abstract ideas:
the state represents a local state in a low-dimensional space (mathematical representation).
Step 2A, Prong 2: Claim 7 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 7 does not recite any additional elements and therefore does not amount to significantly more.
	Claim 8:
Step 1: Claim 8 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 8 recites the following abstract ideas:
the first threshold represents a value that is below an average reward value for a plurality of experiences (mathematical concept).
Step 2A, Prong 2: Claim 8 does not recite any additional elements and therefore does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 8 does not recite any additional elements and therefore does not amount to significantly more.
	Claim 9:
Step 1: Claim 9 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).

the first threshold is used to identify critical events (mental process directed to observation, evaluation).
Step 2A, Prong 2: Claim 9 recites the following additional elements:
any of the plurality of experiences unrelated to the critical events are stored in the experience buffer and any of the plurality of experiences related to the critical events are stored in the event buffer. These are interpreted as transmitting and receiving data, which does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 9 recites the following additional elements:
any of the plurality of experiences unrelated to the critical events are stored in the experience buffer and any of the plurality of experiences related to the critical events are stored in the event buffer. These are interpreted as transmitting and receiving data, which does not amount to significantly more (see MPEP 2106.05(d)(II)).
	Claim 10:
Step 1: Claim 10 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 10 recites the following abstract ideas:
probing the RL model during exploitation to learn a reason why a particular action was performed (mental process directed to evaluation).
Step 2A, Prong 2: Claim 10 recites the following additional elements:
training a Reinforcement Learning (RL) model based on a plurality of experiences that include critical events and non-critical events. These are interpreted as instructions to apply an 
Step 2B, Prong 2: Claim 10 recites the following additional elements:
training a Reinforcement Learning (RL) model based on a plurality of experiences that include critical events and non-critical events. These are interpreted as instructions to apply an abstract idea using a computer, which does not amount to significantly more (see MPEP 2106.05(f)).
	Claim 11:
Step 1: Claim 11 is directed to a method; therefore the claim does fall within one of the four statutory categories (i.e., process, machine, manufacture, or composition of matter).
Step 2A, Prong 1: Claim 11 recites the abstract ideas from claim 1 on which it depends.
	
Step 2A, Prong 2: Claim 11 recites the following additional elements:
plotting, on a display device, a plurality of experiences in a visualization, each of the plurality of experiences having a respective reward to form a plurality of rewards across the plurality of experiences, wherein said plotting step uses the plurality of rewards as weights for the visualization. These are interpreted as generic computer components and transmitting and receiving data, which does not integrate the abstract idea into a practical application.
Step 2B, Prong 2: Claim 11 recites the following additional elements:
plotting, on a display device, a plurality of experiences in a visualization, each of the plurality of experiences having a respective reward to form a plurality of rewards across the plurality of experiences, wherein said plotting step uses the plurality of rewards as weights for the visualization. These are interpreted as generic computer components and transmitting and 
Claim 13 is a non-transitory computer readable storage medium claim and its limitation is included in claim 2. Claim 13 is rejected for the same reasons as claim 2.
	Claim 14 is a non-transitory computer readable storage medium claim and its limitation is included in claim 3. Claim 14 is rejected for the same reasons as claim 3.
	Claim 15 is a non-transitory computer readable storage medium claim and its limitation is included in claim 4. Claim 15 is rejected for the same reasons as claim 4.
Claim 16 is a non-transitory computer readable storage medium claim and its limitation is included in claim 5. Claim 16 is rejected for the same reasons as claim 5.
Claim 17 is a non-transitory computer readable storage medium claim and its limitation is included in claim 6. Claim 17 is rejected for the same reasons as claim 6.
Claim 18 is a non-transitory computer readable storage medium claim and its limitation is included in claim 7. Claim 18 is rejected for the same reasons as claim 7.
Claim 19 is a non-transitory computer readable storage medium claim and its limitation is included in claim 8. Claim 19 is rejected for the same reasons as claim 8.
Viewed as a whole, these additional claim elements do not provide meaningful limitations to transform the abstract idea into a patent eligible application of the abstract idea such that the claims amount to significantly more than the abstract idea itself. Therefore, the claims are rejected under 35 U.S.C. 101 as being directed to non-statutory subject matter.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-10, and 12-20 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Yoshiike et al (US 20100318478 A1, herein Yoshiike).
Regarding claim 1, Yoshiike teaches a computer-implemented method (para. [0009] recites an information processing method according to an embodiment of the present invention) for reinforcement learning (para. [0091] recites that learning, recognition of situations, and planning of actions ( determination of actions) that the agent performs can be applied to a problem that can be formulated with the framework of Marcov decision process (MDP) that is commonly taken as a reinforcement learning problem) performed by a processor (para. [0827] recites that the program may be processed by a single computer (processor), or may be processed by decentralized processing by multiple computers), the method comprising:
obtaining, from an environment, a given experience that includes an action, a state and a reward (fig. 17 and para. [0392] recite the state transition probability aij(Um) (i.e. an experience) regarding each of the state S (i.e. a state), in the i-axis direction of the three-dimensional state transition probability table A made up of the i axis, j axis, and action axis for each of the action Um (i.e. an action). Para. [0735] recites that the action determining unit 24 obtains the sum of state transition probabilities arrayed in the j-axial direction (horizontal direction) on the state transition probability plane for each action Um, as the action suitability (i.e. a reward));
storing the given experience in an experience buffer responsive to a value of the reward included in the given experience exceeding a first threshold (fig. 47 step 343 and para. [0736] recites selecting actions Um of which the action suitability is at or above the threshold candidates for the next action to be performed following the first strategy. Fig. 4 and para. [0106] recite that the series of the observation values (observation value series), and the series of the actions (action series) are stored in the history storage unit 14 (i.e. the experience buffer));
responsive to obtaining another experience having another reward that less than or equal to the first threshold (fig. 47 step 343 and para. [0736] recite that the action determining unit 24 sets the action suitability obtained regarding an action Um of which the action suitability is below a threshold to 0.0, thereby eliminating actions Um of which the action suitability is below a threshold from candidates for the next action to be performed following the first strategy with regard to the state series of interest (i.e. the experience is less below a threshold). Para. [0743] recites if the agent is desired to return to a known location, or if the agent is desired to develop an unknown location, an action where the agent wanders through the action environment is far from desirable. Thus, the action determining unit 24 is arranged so as to be able to determine the next action based on, in addition to the first strategy, a second and third strategy which are described below (i.e. when the first strategy does not result in a desirable action, the action unit turns to a second or third strategy), searching the experience buffer for a candidate experience with a similar state to the other experience (fig 49 and para. [0747] recite the second strategy, wherein there is no state which immediately precedes the last state, the action determining unit 24 refers to the expanded HMM (or the state transition probability thereof) stored in the model storage unit 22 to obtain states for which the last state can serve as a transition destination of state transition (i.e. searching for a similar experience). Para. [0749] recites that the action determining unit 24 sets the action suitability for actions other than the action regarding which the action suitability is the greatest, to 0.0, consequently selecting the action with the greatest action suitability as a candidate for the next action to be performed) and copying the candidate experience into an event buffer (the state transition probability of the HMM (Hidden Marcov Model) is expanded to state transition probability for each action performed by the agent, and the HMM of which the state transition probability is thus expanded (hereafter, also referred to as "expanded HMM") is employed as a learning object by the learning unit 21. Para. [0114] recites that the model storage unit 22 stores (the state transition probability, observation probability, and the like that are model parameters stipulating) the expanded HMM (i.e. the event buffer));
and during exploration, selecting an action to be taken to the environment from the event buffer with a predetermined probability (fig. 47 step 345 and para. [0738] recite that the action determining unit 24 determines the next action from the candidates for the next action (i.e. selects an action with a predetermined probability from the event buffer), based on the action suitability regarding the actions Um obtained for each of the one or more current state series candidates from the state recognizing unit 23).
Regarding claim 2, Yoshiike teaches the method according to claim 1, wherein said storing step comprises:
searching the event buffer for a similar experience having a same state as the experience (fig. 48 shows a method to “search for situation similar to current situation from own structured knowledge” (i.e. searching the event buffer for a similar experience). Para. [0744] recites that in fig. 48, the action determining unit 24 determines, as the next action, an action wherein there is generated state transition from the last state s, of one or more current state series candidates from the state recognizing unit 23, to an immediately preceding state S,_1 immediately before the last state s); and storing the similar experience into the event buffer responsive to any of the action and the reward of the similar experience being different from those of the experience (para. [0115] recites that the state recognizing unit 23 recognizes the current situation of the agent based on the expanded HMM stored in the model storage unit 22 (i.e. the similar experience found by searching the event buffer) using the action series and the observation value series stored in the history storage unit 14, and obtains (recognizes) the current state that is the state of the expanded HMM corresponding to the current situation thereof (i.e. stores the similar experience back into the event buffer) Para. [0741] recites that in the event of determining an action following the first strategy, the agent performs an action which the agent has performed under a known situation similar to the current situation)
Regarding claim 3, Yoshiike teaches the method according to claim 1, further comprising stopping the selecting of the action from the event buffer after a pre-defined number of steps during a training stage of the reinforcement learning (Fig 5 and para. [0146] recites that in the case that determination is made in step S17 that the agent has performed an action by already specified number of times, i.e., in the case that the point-in-time t is equal to the already specified number of times, the processing in the reflective action mode ends (i.e. stopping the selection of the action after a pre-defined number of steps)).
Regarding claim 4, Yoshiike teaches the method according to claim 1, further comprising performing random exploration responsive to said stopping step (fig. 4 and para. [0129] recite that the random target generating unit 35 selects one state out of the states of the expanded HMM stored in the model storage unit 22 at random as a random target, and supplies the random target thereof to the target selecting unit 31 as the internal target serving as the target state).
Regarding claim 5, Yoshiike teaches the method according to claim 1, wherein an event, corresponding to the value of the reward included in the given experience exceeding the first threshold, represents a critical event (fig. 47 step S343 and para. [0736] recite that the action determining unit 24 takes, out of the M (types of) actions U1 through UM regarding which action suitability has been obtained, the action suitability obtained regarding an action Um of which the action suitability is below a threshold, to be 0.0 (i.e. events with action suitability below a threshold are critical events)).
Regarding claim 6, Yoshiike teaches the method according to claim 1, further comprising storing in the experience buffer any experiences previously observed (fig. 4 and para. [0106] recite that the series of the observation values (observation value series), and the series of the actions (action series) are stored in the history storage unit 14 (i.e. the experience buffer)) except for the experiences that resulted in a corresponding reward that fails to exceed the first threshold (para. [0736] recites the action determining unit 24 sets the action suitability obtained regarding an action Um of which the action suitability is below a threshold to 0.0 (i.e. experiences where the corresponding reward fails to exceed a first threshold), thereby eliminating actions Um of which the action suitability is below a threshold from candidates for the next action to be performed).
Regarding claim 7, Yoshiike teaches the method according to claim 1, wherein the state represents a local state in a low-dimensional space (para. [0153] recites that the state transition probability of a common HMM can be represented by a two-dimensional table  (i.e. a low-dimensional space) where the state transition probability aij of the state transition from the state Si to the state Sj is disposed at the i'th from the top and the j'th from the left).
Regarding claim 8, Yoshiike teaches the method according to claim 1, wherein the first threshold represents a value that is below an average reward value for a plurality of experiences (para. [0371] recites that the open-edge detecting unit 37 performs threshold processing for detecting the observation probability B equal to or greater than a threshold with the threshold as 0.5 or the like, for example (i.e. the value for the first threshold)).
Regarding claim 9, Yoshiike teaches the method according to claim 1, wherein the method is applied to the plurality of experiences, wherein the first threshold is used to identify critical events (para. [0365] recites comparing the state transition probability of a certain state, and the state transition probability of another state (i.e. a plurality of experiences) to which observation probability for observing the same observation value as with that state is assigned (a value other than (not regarded as) 0.0), a state is equivalent to the open edge wherein regardless of understanding that state transition to the next state can be performed when a certain action is performed, in this state this action has not been performed, and accordingly, state transition probability has not been assigned thereto (deemed to be 0.0), and state transition is incapable of being performed (i.e. a critical event), and wherein any of the plurality of experiences unrelated to the critical events are stored in the experience buffer and any of the plurality of experiences related to the critical events are stored in the event buffer (para. [0115] recites that the state recognizing unit 23 recognizes the current situation of the agent based on the expanded HMM stored in the model storage unit 22 (i.e. the event buffer) using the action series and the observation value series stored in the history storage unit 14 (i.e. the experience buffer), and obtains (recognizes) the current state that is the state of the expanded HMM corresponding to the current situation thereof).
Regarding claim 10, Yoshiike teaches the method according to claim 1, further comprising training a Reinforcement Learning (RL) model based on a plurality of experiences that include critical events and non-critical events (para. [0091] recites that learning, recognition of situations, and planning of actions (determination of actions) that the agent performs can be applied to a problem that can be formulated with the framework of Marcov decision process (MDP) that is commonly taken as a reinforcement learning problem), and probing the RL model during exploitation to learn a reason why a particular action was performed (fig. 53 and para. [0812] recite a process for selecting a strategy to follow for determining an action (i.e. a reason why a particular action was performed))
Claim 12 is a non-transitory computer readable storage medium claim and its limitation is included in claim 1. The only difference is that claim 12 requires a non-transitory computer readable storage medium (para. [0823] recites upon a command being input by an input unit 107 being operated by the user or the like via the input/output interface 110, the CPU 102 executes a program stored in ROM (Read Only Memory) 103, or loads a program stored in the hard disk 105 to RAM (Random Access Memory) 104 and executes the program (i.e. ROM and RAM are examples of non-transitory computer readable storage media). Therefore, claim 12 is rejected for the same reasons as claim 1.
Claim 13 is a non-transitory computer readable storage medium claim and its limitation is included in claim 2. Claim 13 is rejected for the same reasons as claim 2.
Claim 14 is a non-transitory computer readable storage medium claim and its limitation is included in claim 3. Claim 14 is rejected for the same reasons as claim 3.
Claim 15 is a non-transitory computer readable storage medium claim and its limitation is included in claim 4. Claim 15 is rejected for the same reasons as claim 4.
Claim 16 is a non-transitory computer readable storage medium claim and its limitation is included in claim 5. Claim 16 is rejected for the same reasons as claim 5.
Claim 17 is a non-transitory computer readable storage medium claim and its limitation is included in claim 6. Claim 17 is rejected for the same reasons as claim 6.
Claim 18 is a non-transitory computer readable storage medium claim and its limitation is included in claim 7. Claim 18 is rejected for the same reasons as claim 7.
Claim 19 is a non-transitory computer readable storage medium claim and its limitation is included in claim 8. Claim 19 is rejected for the same reasons as claim 8.
Claim 20 is a system claim and its limitation is included in claim 1. The only difference is that claim 12 requires a system (fig. 4 and para. [0089] recite a configuration example of an embodiment of the agent to which the information processing device (i.e. a system) according to the present invention). Therefore, claim 20 is rejected for the same reasons as claim 1.
	
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Yoshiike et al (US 20100318478 A1, herein Yoshiike) in further view of Mnih et al (US 20150100530 A1, herein Mnih). 
Regarding claim 11, Yoshiike teaches the computer-implemented method of claim 1 (para. [0009] recites an information processing method according to an embodiment of the present invention. Para. [0091] recites that learning, recognition of situations, and planning of actions ( determination of actions) that the agent performs can be applied to a problem that can be formulated with the framework of Marcov decision process (MDP) that is commonly taken as a reinforcement learning problem).

Mnih teaches plotting, on a display device, a plurality of experiences in a visualization, each of the plurality of experiences having a respective reward to form a plurality of rewards across the plurality of experiences, wherein said plotting step uses the plurality of rewards as weights for the visualization (para. [0088] describes a series of figures: fig. 6a and 6c recite visualizations of average reward per episode during training (i.e. plotting each experience in a visualization using the respective reward as a weight for the visualization ), whereas fig. 6b and 6d recite visualizations of the average maximum predicted action-value of a set of states).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine these teachings by use the visualization method from Mnih to plot the experiences from Yoshiike based on their respective rewards. The specification from Yoshiike includes a number of visualizations to show how the rewards of a given experience determine which path the agent follows, but does not describe if these visualizations are provided to the user. Using the method from Mnih to plot the experiences from Yoshiike would provide the user with additional context to understand why the agent follows a specific strategy or whether the agent is repeatedly making similar mistakes, which would allow one of ordinary skill to correct errors or improve overall performance.


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
US 8494980 B2 (Hans et al) teaches using a safety function and a feedback rule to ensure no actions occur which could lead to the system being damaged or to a defective operating state (i.e. a critical event), but does not store information regarding critical events separately from non-critical events and does not teach storing replacement experiences in place of the critical events.
US 20070220303 A1 (Kimura et al) teaches identification of a failure event, selecting  a countermeasure information item from a table of previous countermeasures by frequency of the countermeasure’s usage, and repeating the process with less frequently used countermeasures until the failure event has been resolved. Kimura does not teach reinforcement learning, and does not replace failed event information with the countermeasure event information in event storage.
US 20100094786 A1 (Gupta et al) teaches a reinforcement learning process which delays a reinforcement step until uncertainty in a state estimate is reduced, but does not teach storing information regarding uncertain events separately from certain events and does not teach storing replacement experiences in place of the uncertain events.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEAH M FEITL whose telephone number is (571)272-8350. The examiner can normally be reached on M-F 0800-1700.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B. Zhen can be reached on (571) 272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll- free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


	/L.M.F./             Examiner, Art Unit 2121   



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121