DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Election/Restrictions
2.	Claims 8-12, 19, and 20 are withdrawn from further consideration pursuant to 37 CFR 1.142(b), as being drawn to a nonelected subcombination II, there being no allowable generic or linking claim. Applicant timely traversed the restriction (election) requirement in the reply filed on 08/24/2022.

3.	Applicant's election with traverse of subcombination I in the reply filed on 08/24/2022 is acknowledged.  The traversal is on the ground(s): FIG. 12 "illustrates an operation of adaptively determining a weight kernel" which is directly applicable to the utility of "subcomination I." More specifically, as is clearly indicated in, e.g., in claims 1, 7, and 8 (particularly in view of the above) (see also, e.g., FIG. 6A and the supporting text), the features of, e.g., claim 8 are directly related to "adaptively" setting the weight kernel used in the convolution operation… In view of the above, Applicants note that the Examiner's rejection is improper for at least the reason that it is reliant on a misinterpretation and/or mischaracterization of the present disclosure and/or for the reason that it does not show how "the subcombination can be shown to have utility either by itself or in another materially different combination.".  This is not found persuasive because subcombination I is about a method/device for selecting an action using a neural network. It does not requires the specific method of subcombination II regarding adaptively setting (learning) the weight kernel. In fact, except for claims 7 and 18,  subcombination I does not even requires a convolution operation that uses a weight kernel. It can use other types of neural network. Even if subcombination I uses a convolution operation that uses a weight kernel, it does not require the specific method of adaptively setting (learning) the weight kernel as in subcombination II. For example, it can use a preset weight kernel. Alternatively or additionally, subcombination II has a separate utility such as setting a weight kernel for use in an application different than subcombination I. Claims 8-12, 19, and 20 are directed to a feature of subcombination II that is usable together with subcombination I. When claims of subcombination I are allowable, claims 8-10, 19, and 20 depending on the allowable claims will be rejoined for consideration.
The requirement is still deemed proper and is therefore made FINAL.

Claim Objections
4.	Claim 6 is objected to because of the following informalities:  
In claim 6, line 7, “the selected option” should be --the selected action-- to correct a typo.
Appropriate correction is required.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


5.	Claims 1-3, 6, 13-15, and 17 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Gendron-Bellemare et al. (US 20190332923 A1; hereinafter “Gendron-Bellemare”).

	Regarding claim 1, Gendron-Bellemare teaches a method of selecting an action based on deep learning, executed by a device including a neural network device (i.e., “reinforcement learning system 100”; see [0033] and FIG. 1; “a method for selecting an action to be performed by a reinforcement learning agent… The distributional Q network is a deep neural network”; see [0006]), the method comprising:
receiving, by the neural network device, a current state as an input (i.e., “receiving a current observation characterizing a current state of the environment… processed using a distributional Q network”; see [0006]);
calculating, by the neural network device, a value distribution corresponding to each of a plurality of actions to be performed on the current state (i.e., “For each action in a set of multiple actions that can be performed by the agent… generate a network output that defines a probability distribution over possible Q returns for the action-current observation pair”; see [0006]); and 
selecting, by the neural network device, an action from among the plurality of actions based on the value distribution (i.e., “An action is selected to be performed by the agent in response to the current observation using the measures of central tendency for the actions”; see [0006]), 
wherein the value distribution includes at least one Gaussian graph following a Gaussian distribution (i.e., “the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see [0042] and FIG. 1).

	Regarding claim 2, Gendron-Bellemare further teaches: wherein
the calculating of the value distribution includes calculating the at least one Gaussian graph by using a value distribution network (i.e., “the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see [0042] and FIG. 1), 
the value distribution network includes a distributional neural network configured to output a plurality of network parameters defining a probability distribution of a value return possible for each current state-action pair (i.e., “the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see [0042] and FIG. 1; “The distributional Q network is a deep neural network that is configured to process the action and the current observation in accordance with current values of the network parameters to generate a network output that defines a probability distribution over possible Q returns for the action-current observation pair”; see [0006]), and
the value return includes an estimation value of a value obtained as a result of each action performed on the current state (i.e., “Each possible Q return is an estimate of a return that would result from the agent performing the action in response to the current observation”; see [0006]).

	Regarding claim 3, Gendron-Bellemare further teaches:
wherein the plurality of network parameters includes, of each of the at least one Gaussian graph, at least one of a probability weight, a value mean, and a value standard deviation (i.e., “the output of the distributional Q network 112 includes respective output values that define a parametric probability distribution over the set of possible Q returns. For example, the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see [0042] and FIG. 1).

	Regarding claim 6, Gendron-Bellemare further teaches: wherein the selecting of the action includes:
calculating, by the neural network device, an average value of each of the value distributions respectively corresponding to the plurality of actions (i.e., “the system 100 determines a corresponding measure of central tendency 122 (i.e., a central or typical value) of the set of possible Q returns with respect to the probability distribution defined by the output of the distributional Q network 112 for the action-current observation pair. For example, as will be described further with reference to FIG. 2, the measure of central tendency may be a mean, a median, or a mode”; see [0046]); and
determining, by the neural network device, an action, corresponding to the value distribution where the average value is largest, as an optimal action, selecting the optimal action as the selected option (i.e., “the system 100 selects an action having a highest corresponding measure of central tendency 122 from amongst all the actions in the set of actions that can be performed by the agent 104”; see [0047]).

	Regarding claim 13, the claim recites the same substantive limitations as claim 1 and is rejected using the same teachings. Note that Gendron-Bellemare teaches the “processing circuitry” (i.e., “The reinforcement learning system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations in which the systems, components, and techniques described below are implemented”; see [0033]).

	Regarding claim 14, the claim recites the same substantive further limitations as claim 2 and is rejected using the same teachings.

	Regarding claim 15, the claim recites the same substantive further limitations as claim 3 and is rejected using the same teachings.

	Regarding claim 17, the claim recites the same substantive further limitations as claim 6 and is rejected using the same teachings.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


6.	Claims 4, 5, 7, 16, and 18 are rejected under 35 U.S.C. 103 as being obvious over Gendron-Bellemare in view of Choi et al. (“Distributional Deep Reinforcement Learning with a Mixture of Gaussians” 2019 International Conference on Robotics and Automation (ICRA) Palais des congres de Montreal, Montreal, Canada, May 20-24, 2019; cited in IDS; hereinafter “Choi”).

	Regarding claim 4, Gendron-Bellemare further teaches:
wherein the value distribution includes a graph of overlapping a first Gaussian graph, 
calculating, by the neural network device, 
	Gendron-Bellemare does not explicitly disclose (see the underlined):
wherein the value distribution includes a graph of overlapping a first Gaussian graph, a second Gaussian graph, and a third Gaussian graph, the calculating of the value distribution includes 
calculating, by the neural network device, a first probability weight, a first value mean, and a first value standard deviation of the first Gaussian graph by using a value distribution network;
calculating, by the neural network device, a second probability weight, a second value mean, and a second value standard deviation of the second Gaussian graph by using the value distribution network;
calculating, by the neural network device, a third probability weight, a third value mean, and a third value standard deviation of the third Gaussian graph by using the value distribution network; and
generating, by the neural network device, the value distribution by allowing the first Gaussian graph, the second Gaussian graph, and the third Gaussian graph to overlap one another based on the results of the calculations.
	But Choi teaches:
wherein a value distribution includes a graph of overlapping a first Gaussian graph, a second Gaussian graph, and a third Gaussian graph (i.e., “Gaussian mixture model (GMM), is well-known for its expressiveness. It is shown to be able to approximate any given density function to arbitrary accuracy by adjusting the number of mixtures… the state-action value distribution can be characterized using a GMM parametric model”; see p. 9793, col. 2, the last paragraph through p. 9794, col. 1, ¶ 1; see also FIG. 3(b) showing three Gaussian graphs overlapped to form the value distribution), the calculating of the value distribution includes 
calculating a probability weight, a mean, and a variance of each of the three Gaussian graphs; and generating, by the neural network device, the value distribution by allowing the three Gaussian graphs to overlap one another based on the results of the calculations (i.e., “value distribution can be characterized using a GMM parametric model… where                     
                        
                            
                                π
                            
                            
                                j
                            
                        
                        
                            
                                x
                                ,
                                a
                            
                        
                        ,
                         
                        
                            
                                μ
                            
                            
                                j
                            
                        
                        
                            
                                x
                                ,
                                a
                            
                        
                    
                , and                     
                        
                            
                                σ
                            
                            
                                j
                            
                        
                        
                            
                                (
                                x
                                ,
                                a
                                )
                            
                            
                                2
                            
                        
                    
                 are the j-th mixture weight, mean, and variance function, respectively”; see p. 9794, col. 1, ¶¶ 1-2).
	Also, it is well-known that standard deviation and variance are convertible between each other.
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Choi to incorporate the GMM technique for the value distribution, such that the value distribution includes a graph of overlapping a first Gaussian graph, a second Gaussian graph, and a third Gaussian graph, the calculating of the value distribution includes calculating, by the neural network device, a first probability weight, a first value mean, and a first value standard deviation of the first Gaussian graph by using a value distribution network; calculating, by the neural network device, a second probability weight, a second value mean, and a second value standard deviation of the second Gaussian graph by using the value distribution network; calculating, by the neural network device, a third probability weight, a third value mean, and a third value standard deviation of the third Gaussian graph by using the value distribution network; and generating, by the neural network device, the value distribution by allowing the first Gaussian graph, the second Gaussian graph, and the third Gaussian graph to overlap one another based on the results of the calculations, as claimed. The motivation would be to set the number (such as thee) of Gaussian mixtures for better accuracy in approximating the value distribution (see Choi,  p. 9793, col. 2, the last paragraph).

	Regarding claim 5, Gendron-Bellemare does not explicitly disclose:
wherein the calculating of the value distribution includes:
receiving, by the neural network device, a number of Gaussian graphs for generating the value distribution;
calculating, by the neural network device, a plurality of Gaussian graphs by using a value distribution network based on the number of Gaussian graphs; and
generating, by the neural network device, the value distribution by overlapping the calculated plurality of Gaussian graphs.
	But Choi teaches:
wherein calculating of a value distribution includes:
selecting a number of Gaussian graphs for generating the value distribution (i.e., “adjusting the number of mixtures… K is the number of mixtures”; see p. 9793, col. 2, the last paragraph through p. 9794, col. 1, ¶ 1);
calculating a plurality of Gaussian graphs by using a value distribution network based on the number of Gaussian graphs; and generating the value distribution by overlapping the calculated plurality of Gaussian graphs (i.e., “distribution can be characterized using a GMM parametric model… where K is the number of mixtures… a neural network whose output consists of parameters of a GMM as seen in Figure 3-(b)”; see p. 9794, col. 1, ¶¶ 1-2 and FIG. 3(b)).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Choi to incorporate the GMM technique for the value distribution, such that the calculating of the value distribution includes: receiving, by the neural network device, a number of Gaussian graphs for generating the value distribution; calculating, by the neural network device, a plurality of Gaussian graphs by using a value distribution network based on the number of Gaussian graphs; and generating, by the neural network device, the value distribution by overlapping the calculated plurality of Gaussian graphs, as claimed. The motivation would be allow adjusting the number of Gaussian mixtures for better accuracy in approximating the value distribution (see Choi,  p. 9793, col. 2, the last paragraph).

	Regarding claim 7, Gendron-Bellemare does not explicitly disclose:
wherein the calculating of the value distribution includes:
performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and
generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation.
	But Choi teaches:
wherein calculating of a value distribution includes:
performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state 
generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation (i.e., the state-action value distribution can be characterized using a GMM parametric model… we construct a mixture density network (MDN)… a neural network whose output consists of parameters of a GMM as seen in Figure 3-(b)”; see p. 9793, col. 2, the last paragraph through p. 9794, col. 1, ¶ 2; see also FIG. 3(b)).
	Also, it is well-known that a convolutional neural networks uses a weight kernel.
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Choi to incorporate the GMM technique for the value distribution, such that the calculating of the value distribution includes: performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation, as claimed. The motivation would be process the input that is suitable for a convolutional neural network (see Choi FIG. 3 and Gendron-Bellemare [0043] suggesting a convolutional neural network).

	Regarding claim 16, the claim recites the same substantive further limitations as claim 5 and is rejected using the same teachings. 
	
	Regarding claim 18, the claim recites the same substantive further limitations as claim 7 and is rejected using the same teachings.

7.	Claims 7 and 18 are alternatively rejected under 35 U.S.C. 103 as being obvious over Gendron-Bellemare in view of Lee et al. (“Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling” Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018; hereinafter “Lee”).

	Regarding claim 7, Gendron-Bellemare does not explicitly disclose:
wherein the calculating of the value distribution includes:
performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and
generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation.
	But Lee teaches:
wherein calculating of a value distribution includes:
performing a convolution operation on an input feature map corresponding to the current state by using a weight kernel (i.e., “As input, a feature map… convolutional operations”; see FIG. 1 and associated text; see also FIG. 2 and associated text indicating a 3x3 filter as a weight kernel); and
generating a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation (i.e., “The output of the policy head is the probability distribution of each action”; see FIG. 1; “The policy head has two more convolutional layers, while the value head has two fully connected layers on top of a convolutional layer”; see FIG. 2 and associated text).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Lee to incorporate the deep CNN policy value technique, such that the calculating of the value distribution includes: performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation, as claimed. The motivation would be to process the input that is suitable for a convolutional neural network (see Gendron-Bellemare [0043] suggesting a convolutional neural network).

	Regarding claim 18, the claim recites the same substantive further limitations as claim 7 and is rejected using the same teachings.

Prior Art
8.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Blundell et al. (US 10438114 B1) teaches a system for content recommendation using neural networks, involving receiving context information for an action recommendation; processing the context information using a neural network that comprises one or more Bayesian neural network layers to generate, for each of the actions, one or more parameters of a distribution over possible action scores for the action and selecting an action from plurality of possible actions using the parameters of the distributions over the possible action scores for the action.
	Schaul et al. (US 20210089908 A1) teaches a method for controlling an agent, involving an action selection neural network which is a distributional neural network; and a policy output includes, for each action, data characterizing a probability distribution over possible returns that would result from the agent performing the action, i.e., over possible action scores for the agent.
	Palanisamy (US 20200293041 A1) teaches a method for determining a vehicle action to be carried out by an autonomous vehicle based on a composite behavior policy, involving a policy layer that determines a vehicle action (or distribution of vehicle actions) based on the observed vehicle state and a value layer that determines feedback (e.g., a value or reward, or distribution of values or rewards) based on the observed vehicle state and the vehicle action that was carried out.
	Van de Wiele et al. (US 20210357731 A1) teaches a system for training a neural network system used to control an agent interacting with an environment, involving configuring the neural network system to process an input that includes a current observation characterizing a current state of an environment to generate a set of Q values for a proper subset of the actions in a set of possible actions that can be performed by the agent. The system uses the Q values to control the agent, i.e., to select the action to be performed by the agent at the current time step by selecting the action with the highest Q value.
	 Min et al. (“Deep Distributional Reinforcement Learning Based High-Level Driving Policy Determination” IEEE TRANSACTIONS ON INTELLIGENT VEHICLES, VOL. 4, NO. 3, SEPTEMBER 2019) teaches a supervisor agent that can enhance the driver assistant systems by using deep distributional reinforcement learning, involving expressing an expected future reward of the state and action as a multi-modal distribution.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN C KUAN whose telephone number is (571)270-7066. The examiner can normally be reached M-F: 9:00AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Schechter can be reached on (571)272-2302. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN C KUAN/Primary Examiner, Art Unit 2857