DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
1.	The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


2.	Claims 1-7 and 13-18 are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. 
MPEP 2106 outlines a two-part analysis for Subject Matter Eligibility as shown in the chart below.
	
    PNG
    media_image1.png
    930
    645
    media_image1.png
    Greyscale


Step 1, the claimed invention must be to one of the four statutory categories. 35 U.S.C. 101 defines the four categories of invention that Congress deemed to be the appropriate subject matter of a patent: processes, machines, manufactures and compositions of matter. 
Step 2, the claimed invention also must qualify as patent-eligible subject matter, i.e., the claim must not be directed to a judicial exception unless the claim as a whole includes additional limitations amounting to significantly more than the exception.
Step 2A is a two-prong inquiry, as shown in the chart below.

    PNG
    media_image2.png
    681
    881
    media_image2.png
    Greyscale

Prong One asks does the claim recite an abstract idea, law of nature, or natural phenomenon? In Prong One examiners evaluate whether the claim recites a judicial exception, i.e. whether a law of nature, natural phenomenon, or abstract idea is set forth or described in the claim. If the claim recites a judicial exception (i.e., an abstract idea enumerated in MPEP § 2106.04(a), a law of nature, or a natural phenomenon), the claim requires further analysis in Prong Two. If the claim does not recite a judicial exception (a law of nature, natural phenomenon, or abstract idea), then the claim cannot be directed to a judicial exception (Step 2A: NO), and thus the claim is eligible at Pathway B without further analysis. Abstract ideas can be grouped as, e.g., mathematical concepts, certain methods of organizing human activity, and mental processes.
Prong Two asks does the claim recite additional elements that integrate the judicial exception into a practical application? If the additional elements in the claim integrate the recited exception into a practical application of the exception, then the claim is not directed to the judicial exception (Step 2A: NO) and thus is eligible at Pathway B. This concludes the eligibility analysis. If, however, the additional elements do not integrate the exception into a practical application, then the claim is directed to the recited judicial exception (Step 2A: YES), and requires further analysis under Step 2B.

Regarding claim 1, Step 1: Is the claim to a process, machine, manufacture or composition of matter? Yes.
Step 2A: Is the claim directed to a law of nature, a natural phenomenon, or an abstract idea (judicially recognized exceptions)? Yes (see analysis below).
Prong one: Whether the claim recites a judicial exception? (Yes). The claim recites: 
1. A method of selecting an action based on deep learning, executed by a device including a neural network device, the method comprising: 
receiving, by the neural network device, a current state as an input; 
calculating, by the neural network device, a value distribution corresponding to each of a plurality of actions to be performed on the current state; and 
selecting, by the neural network device, an action from among the plurality of actions based on the value distribution, wherein the value distribution includes at least two overlapping Gaussian graphs following Gaussian distributions.

The above underlined limitations are about mathematical concepts – mathematical relationships, mathematical formulas or equations, mathematical calculations; and/or a mental process (i.e., “selecting”). Therefore, it is directed to an abstract idea.
Prong two: Whether the claim recites additional elements that integrate the exception into a practical application of that exception? (No). The claim recites additional elements as indicated in bold-face above. However, the “device” can be a conventional computer. The “neural network device” can be a conventional computer configured for performing calculation using a trained neural network, which is a mathematical expression/relationship. The “receiving” step is an “insignificant extra-solution activity to the judicial exception” to collect the data for the abstract idea. Accordingly, the claim has not been integrated into a practical application.
Step 2B: Does the claim recite additional elements (other than the judicial exception) that amount to significantly more than the judicial exception? No (see analysis below).
The device, neural network device, and the receiving step are merely utilizing a conventional computer to receive data and  process the received data according to the abstract idea. The claim does not include additional elements that are sufficient to make the claim significantly more than the judicial exception. 

Claim 13 is rejected by analogy to claim 1.

Dependent claims 2-7 and 14-18 when analyzed as a whole respectively are held to be patent ineligible under 35 U.S.C. 101 because they either extend (or add more details to) the abstract idea or the additional recited limitation(s) (if any) fail(s) to establish that the claim(s) is/are not directed to an abstract idea, as discussed below:  there is no additional element(s) in the dependent claims that adds a meaningful limitation to the abstract idea to make the claim significantly more than the judicial exception (abstract idea). The additional element(s) (if any) are recited at a high level of generality, well-understood, routine, or conventional to facilitate the application of the abstract idea.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


3.	Claims 1-6 and 13-17 are rejected under 35 U.S.C. 103 as being unpatentable over Gendron-Bellemare et al. (US 20190332923 A1; cited previously; hereinafter “Gendron-Bellemare”) in view of Bishop (“Mixture Density Networks” Neural Computing Research Group Report: NCRG/94/004, Previously issued as NCRG/94/4288, 1994).

	Regarding claim 1, Gendron-Bellemare teaches a method of selecting an action based on deep learning, executed by a device including a neural network device (i.e., “reinforcement learning system 100”; see [0033] and FIG. 1; “a method for selecting an action to be performed by a reinforcement learning agent… The distributional Q network is a deep neural network”; see [0006]), the method comprising: 
receiving, by the neural network device, a current state as an input (i.e., “receiving a current observation characterizing a current state of the environment… processed using a distributional Q network”; see [0006]); 
calculating, by the neural network device, a value distribution corresponding to each of a plurality of actions to be performed on the current state (i.e., “For each action in a set of multiple actions that can be performed by the agent… generate a network output that defines a probability distribution over possible Q returns for the action-current observation pair”; see [0006]); and 
selecting, by the neural network device, an action from among the plurality of actions based on the value distribution (i.e., “An action is selected to be performed by the agent in response to the current observation using the measures of central tendency for the actions”; see [0006]), 
wherein the value distribution includes 
	Gendron-Bellemare does not explicitly disclose (see the underlined):
wherein the value distribution includes at least two overlapping Gaussian graphs following Gaussian distributions.
	But Bishop teaches:
a linear combination of Gaussian distribution kernel function for modeling a more complete general distribution than a single Gaussian distribution (see p. 6, ¶ 3).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Bishop, to replace each of the Gaussian graphs with a linear combination of Gaussian distribution graphs, such that the value distribution includes at least two overlapping Gaussian graphs following Gaussian distributions, as claimed. The motivation would be to model the value distribution with a more complete distribution by using a mixture of Gaussian distributions (see Bishop, p. 6, ¶ 3).

	Regarding claim 2, as a result of modification applied to claim 1 above, 
Gendron-Bellemare in view of Bishop further teaches: 
wherein 
the calculating of the value distribution includes calculating the at least two overlapping Gaussian graphs by using a value distribution network (i.e., “the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see Gendron-Bellemare, [0042] and FIG. 1 and Bishop, p. 6, ¶ 3), 
the value distribution network includes a distributional neural network configured to output a plurality of network parameters defining a probability distribution of a value return possible for each current state-action pair (i.e., “the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see Gendron-Bellemare, [0042] and FIG. 1; “The distributional Q network is a deep neural network that is configured to process the action and the current observation in accordance with current values of the network parameters to generate a network output that defines a probability distribution over possible Q returns for the action-current observation pair”; see Gendron-Bellemare, [0006]), and
the value return includes an estimation value of a value obtained as a result of each action performed on the current state (i.e., “Each possible Q return is an estimate of a return that would result from the agent performing the action in response to the current observation”; see Gendron-Bellemare, [0006]).

	Regarding claim 3, as a result of modification applied to claim 2 above, 
Gendron-Bellemare in view of Bishop further teaches:
wherein the plurality of network parameters includes, of each of the at least two overlapping Gaussian graphs, at least one of a probability weight, a value mean, and a value standard deviation (i.e., “the output of the distributional Q network 112 includes respective output values that define a parametric probability distribution over the set of possible Q returns. For example, the output of the distributional Q network 112 may include respective output values defining a mean and a standard deviation of a Normal distribution over the set of possible Q returns”; see Gendron-Bellemare, [0042] and FIG. 1 and Bishop, p. 6, ¶ 3).

	Regarding claim 4, 	 Gendron-Bellemare further teaches:
wherein the value distribution includes a graph of 
calculating, by the neural network device, 
	Gendron-Bellemare does not explicitly disclose (see the underlined):
wherein the value distribution includes a graph of overlapping a first Gaussian graph, a second Gaussian graph, and a third Gaussian graph, the calculating of the value distribution includes 
calculating, by the neural network device, a first probability weight, a first value mean, and a first value standard deviation of the first Gaussian graph by using a value distribution network;
calculating, by the neural network device, a second probability weight, a second value mean, and a second value standard deviation of the second Gaussian graph by using the value distribution network;
calculating, by the neural network device, a third probability weight, a third value mean, and a third value standard deviation of the third Gaussian graph by using the value distribution network; and
generating, by the neural network device, the value distribution by allowing the first Gaussian graph, the second Gaussian graph, and the third Gaussian graph to overlap one another based on results of the calculations.
	However, as a result of modification applied to claim 1 above, at least two overlapping Gaussian graphs can be used to replace the single Gaussian graph by a weighted sum using probability weights (i.e., the mixing coefficients; see Bishop, p. 7, ¶ 2).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to use at least three weighted Gaussian graphs, such that  the value distribution includes a graph of overlapping a first Gaussian graph, a second Gaussian graph, and a third Gaussian graph, the calculating of the value distribution includes calculating, by the neural network device, a first probability weight, a first value mean, and a first value standard deviation of the first Gaussian graph by using a value distribution network; calculating, by the neural network device, a second probability weight, a second value mean, and a second value standard deviation of the second Gaussian graph by using the value distribution network; calculating, by the neural network device, a third probability weight, a third value mean, and a third value standard deviation of the third Gaussian graph by using the value distribution network; and generating, by the neural network device, the value distribution by allowing the first Gaussian graph, the second Gaussian graph, and the third Gaussian graph to overlap one another based on results of the calculations, as claimed. The motivation would be to model the value distribution with a more complete distribution by using a mixture of Gaussian distributions.

	Regarding claim 5, the prior art applied to the preceding linking claim(s) teaches the features of the linking claim(s). 
	Gendron-Bellemare does not explicitly disclose:
wherein the calculating of the value distribution includes:
receiving, by the neural network device, a number of Gaussian graphs for generating the value distribution;
calculating, by the neural network device, a plurality of Gaussian graphs by using a value distribution network based on the number of Gaussian graphs; and
generating, by the neural network device, the value distribution by overlapping the calculated plurality of Gaussian graphs.
	But Bishop teaches:
a linear combination of a number of Gaussian graphs (i.e., “The probability density of the target data is then represented as a linear combination of kernel functions… m is the number of components in the mixture”; see p. 6, ¶ 3). 
	As a result of modification applied to claim 1 above and the teaching of Bishop, it would have been obvious to one of ordinary skill in the art at the time the application was filed to adapt the step of calculating the value distribution, by linear combination, including: receiving, by the neural network device, a number of Gaussian graphs for generating the value distribution; calculating, by the neural network device, a plurality of Gaussian graphs by using a value distribution network based on the number of Gaussian graphs; and generating, by the neural network device, the value distribution by overlapping the calculated plurality of Gaussian graphs, as claimed. The motivation would be to flexibly determine the number of Gaussian kernels for the Gaussian mixture. 

	Regarding claim 6, Gendron-Bellemare further teaches: wherein the selecting of the action includes:
calculating, by the neural network device, an average value of each of the value distributions respectively corresponding to the plurality of actions (i.e., “the system 100 determines a corresponding measure of central tendency 122 (i.e., a central or typical value) of the set of possible Q returns with respect to the probability distribution defined by the output of the distributional Q network 112 for the action-current observation pair. For example, as will be described further with reference to FIG. 2, the measure of central tendency may be a mean, a median, or a mode”; see [0046]); and
determining, by the neural network device, an action, corresponding to the value distribution where the average value is largest, as an optimal action, selecting the optimal action as the selected action (i.e., “the system 100 selects an action having a highest corresponding measure of central tendency 122 from amongst all the actions in the set of actions that can be performed by the agent 104”; see [0047]).

	Regarding claim 13, the claim recites the same substantive limitations as claim 1 and is rejected using the same teachings. Note that Gendron-Bellemare teaches the “processing circuitry” (i.e., “The reinforcement learning system 100 is an example of a system implemented as computer programs on one or more computers in one or more locations in which the systems, components, and techniques described below are implemented”; see [0033]).

	Regarding claim 14, the claim recites the same substantive further limitations as claim 2 and is rejected using the same teachings.

	Regarding claim 15, the claim recites the same substantive further limitations as claim 3 and is rejected using the same teachings.

	Regarding claim 16, the claim recites the same substantive further limitations as claim 5 and is rejected using the same teachings. 
	
	Regarding claim 17, the claim recites the same substantive further limitations as claim 6 and is rejected using the same teachings.

4.	Claims 7 and 18 are rejected under 35 U.S.C. 103 as being obvious over Gendron-Bellemare in view of Bishop and Lee et al. (“Deep Reinforcement Learning in Continuous Action Spaces: a Case Study in the Game of Simulated Curling” Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, PMLR 80, 2018; cited previously; hereinafter “Lee”).

	Regarding claim 7, Gendron-Bellemare does not explicitly disclose:
wherein the calculating of the value distribution includes:
performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and
generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation.
	But Lee teaches:
wherein calculating of a value distribution includes:
performing a convolution operation on an input feature map corresponding to the current state by using a weight kernel (i.e., “As input, a feature map… convolutional operations”; see FIG. 1 and associated text; see also FIG. 2 and associated text indicating a 3x3 filter as a weight kernel); and
generating a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation (i.e., “The output of the policy head is the probability distribution of each action”; see FIG. 1; “The policy head has two more convolutional layers, while the value head has two fully connected layers on top of a convolutional layer”; see FIG. 2 and associated text).
	It would have been obvious to one of ordinary skill in the art at the time the application was filed to modify Gendron-Bellemare in view of Bishop, further in view of Lee to incorporate the deep CNN policy value technique, such that the calculating of the value distribution includes: performing, by the neural network device, a convolution operation on an input feature map corresponding to the current state by using a weight kernel; and generating, by the neural network device, a plurality of Gaussian graphs based on a full connection between each of the plurality of actions and elements of an output feature map generated by a result of the convolution operation, as claimed. The motivation would be to process the input that is suitable for a convolutional neural network (see Gendron-Bellemare [0043] suggesting a convolutional neural network).

	Regarding claim 18, the claim recites the same substantive further limitations as claim 7 and is rejected using the same teachings.

Response to Arguments
5.	The objections to the claims have been withdrawn in view of the amendment.

6.	Regarding issues of 102/103, Applicant’s arguments regarding the validity of the Choi reference are fully considered and are persuasive. Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of Bishop (see rejections above).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHN C KUAN whose telephone number is (571)270-7066. The examiner can normally be reached M-F: 9:00AM-5:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Andrew Schechter can be reached on (571)272-2302. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JOHN C KUAN/Primary Examiner, Art Unit 2857