Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

DETAILED ACTION
1.       This Office Action is in response to the communication filed on September 9, 2020, which paper has been placed of record in the file.
2.           Claims 1-15 are pending in this application. 



Information Disclosure Statement
3.        The information disclosure statements (IDS) submitted on September 9, 2020 in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

.


Claim Rejections - 35 USC § 101
4.        35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


            Note: Examiner points Applicant to the 2019 Revised Patent Subject Matter Eligibility Guidance (2019 PEG).

5.      Claims 1-15 are rejected under 35 U.S.C. 101 because the claim invention is directed to a judicial exception (i.e., law of nature, natural phenomenon, or abstract idea) without significantly more.
             Independent claim 1, which is illustrative of the all independent claims and analyzing as the following:
         Step 1: Statutory Category? (is the claim(s) directed to a process, machine, manufacture or composition of matter?). Yes. The claim recites a method and, therefore, is a process.
           Step 2A - Prong 1: Judicial Exception Recited? (is the claim(s) recited a judicial exception (an abstract idea enumerated in the 2019 PEG, a law of nature, or a natural phenomenon). Yes. The claim recites the following limitations: generating an observation of an environment…, sending a first message that includes the observation…, receiving a second message that includes a goal…, evaluating a plurality of action to determine a selected action, and applying the selected action to the environment…, which is a method of organizing human activity (managing personal behavior or relationships or interactions between people including social activities, teaching, and following rules or instructions), then it falls within the “Organizing human activity” grouping of abstract idea. Moreover, the claim recites the following limitations of generating an observation of an environment…, evaluating a plurality of action to determine a selected action, and applying the selected action to the environment…, as drafted, is a process that, under its broadest reasonable interpretation, covers performance of the limitations in the mind but for the recitation of generic computer components. That is, other than reciting “a computer”, nothing in the claim elements preclude the steps from practically being performed in the mind. The mere nominal recitation of a generic computing device does not take the claim limitation out of the mental processes grouping. Thus, if a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind, then it falls within the “Mental Processes” grouping of abstract ideas. Accordingly, the claim recites an abstract idea. 
             Step 2A - Prong 2: Integrated into a Practical Application? (is the claim(s) recited additional elements that integrate the exception into a practical application of the exception). No. This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements of a processor and a memory, and using the processor to perform generating, sending, receiving, evaluating, and applying steps. The processor is recited at a high-level of generality (i.e., as a generic computing device performing a generic computer function of generating, sending, receiving, evaluating, and applying steps) such that it amounts no more than mere instructions to apply the exception using generic computer components. Each of the additional limitations is no more than mere instructions to apply the exception using generic computer components (the processor). The combination of these additional elements is no more than mere instructions to apply the exception using a generic computer components. Moreover, the claim recites the additional limitations “sending a first message that includes the observation by the first agent process, receiving a second message that includes a goal by the first agent process”, which are recited at a high level of generality (i.e., as a general means of receiving and transmitting data), which is a form of insignificant extra-solution activity. Each of the additional limitations is no more than mere instructions to apply the exception using a generic computer component (the computer). The combination of these additional elements is no more than mere instructions to apply the exception using a generic computer component. Accordingly, even in combination, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea. Accordingly, the claim is directed to an abstract idea. 
           The Berkheimer Memorandum mandates that an additional element (or combination of elements) is not well-understood, routine or conventional unless the examiner finds, and expressly supports a rejection in writing with, one or more of the following: 
           (1) a citation to an express statement in the specification or to a statement made by an applicant during prosecution that demonstrates the well-understood, routine, conventional nature of the additional element(s); 
           (2) a citation to one or more of the court decisions discussed in MPEP § 2106.05(d)(II) as noting the well-understood, routine, conventional nature of the additional element(s); 
           (3) a citation to a publication that demonstrates the well-understood, routine, conventional nature of the additional element(s); or 

            In this case, the present Specification described in figure 4 and paras [0049-0050] of using general-purpose computer and available commercial products to perform the method. Thus, the applicant provides (1) a citation to an express statement in the specification or to a statement made by an applicant during prosecution that demonstrates the well-understood, routine, conventional nature of the additional elements. 
	Step 2B: Claim provides an Inventive Concept? (is the claim(s) recited additional elements that amount to an inventive concept (aka “significantly more”) than the recited judicial exception). No. As discussed with respect to Step 2A Prong Two, the additional elements in the claim amount to no more than mere instructions to apply the exception using a generic computer component. The same analysis applies here in 2B, i.e., mere instructions to apply an exception on a generic computer cannot integrate a judicial exception into a practical application at Step 2A or provide an inventive concept in Step 2B.
           Under the 2019 PEG, a conclusion that an additional element is insignificant extra-solution activity in Step 2A should be re-evaluated in Step 2B. Here, the limitations “sending a first message that includes the observation by the first agent process, receiving a second message that includes a goal by the first agent process” were considered to be extra-solution activity in Step 2A, and thus they are re-evaluated in Step 2B to determine if they are more than what is well-understood, routine, conventional Symantec, TLI, and OIP Techs. court decisions cited in MPEP 2106.05(d)(II) indicate that mere collection or receipt of data over a network is a well‐understood, routine, and conventional function when it is claimed in a merely generic manner (as it is here). Accordingly, a conclusion that receiving and transmitting data is well-understood, routine, conventional activity is supported under Berkheimer Option 2. Moreover, the limitations of “sending a first message that includes the observation by the first agent process, receiving a second message that includes a goal by the first agent process” do not providing any improvements to the computer functionality, improvements to the network, improvements to the user device, they are just merely used as general means for collecting and transmitting information, they do not amount to an inventive concept. For these reasons there is no inventive concept in the claim, and thus the claim is not patent eligible.
         Berkheimer Option 2, the courts have recognized the following computer functions as well understood, routine, and conventional functions when they are claimed in a merely genetic manner (e.g., at a high level of generality) or as insignificant extra-solution activity.
          Computer Functions recited at a high-level of generality:
          i. Receiving or transmitting data over a network (Symantic, TLI Communications, OIP Techs, buySafe).
          ii. Performing repetitive calculations (Flook, Bancorp).
          iii. Electronic recordkeeping (Alice Corp, Ultramercial).
          iv. Storing and retrieving information in memory (Versata Dev. Group, Inc., OIP).
Content Extraction and Transmission, LLC).
           Accordingly, a conclusion that the “sending a first message that includes the observation by the first agent process, receiving a second message that includes a goal by the first agent process” are well-understood, routine, conventional activity is supported under Berkheimer Option 2. Moreover, The limitations of “sending a first message that includes the observation by the first agent process, receiving a second message that includes a goal by the first agent process”,  which do not amount to significantly more than the abstract idea they do not provide any improvements to another technology or technical field, improvements to the functioning of the computer, improvements to the network, improvements to the user device, they just merely used as general means for collecting and transmitting data, they do not amount to an inventive concept, and because they well understood, routine, and conventional functions when they are claimed in a merely genetic manner (e.g., at a high level of generality) or as insignificant extra-solution activity. It is similar to other concepts that have been identified by the courts, such as Receiving or transmitting data over a network (Symantic, TLI Communications, OIP Techs, buySafe). Therefore, the claims do not amount to significantly more than the abstract idea. For these reasons there is no inventive concept in the claim, and thus the claim is not patent eligible.
         The dependent claims do not add limitations that meaningfully limit the abstract idea. For example, Claim 2 recites wherein the meta-agent process executes on a higher hierarchical level…;  Claim 8 recites wherein the evaluation includes determining a predicted result and associated reward…; Claim 9 recites the evaluation is performed by Claim 10 recites the environment is physical hardware being monitored and controlled…; Claim 11 recites the environment is one of a computer system…; Therefore, the dependent claims do not impart patent eligibility to the abstract idea of the independent claim. The dependent claims rather further narrow the abstract idea and the narrower scope does not change the outcome of the two part Mayo test. Narrowing the scope of the claims is not enough to impart eligibility as it is still interpreted as an abstract idea, a narrower abstract idea. Therefore none of the dependent claims alone or as an ordered combination add limitations that qualify as significantly more than the abstract idea. 
          Regarding independent claims 14 and 15 Alice Corp. establishes that the same analysis should be used for all categories of claims. Therefore, independent claim 14 directed to a system, independent claim 15 directed to a medium are also rejected as ineligible subject matter under 35 U.S.C. 101 for substantially the same reasons as independent method claim 1. 
          Accordingly, claims 1-15 are not draw to eligible subject matter as they are directed to an abstract idea without significantly more and are rejected under 35 USC § 101 as being directed to non-statutory subject matter.


           In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new 



  Claim Rejections - 35 USC § 102
6. 	The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.


7.      Claims 1-15 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Xiong  et al. (hereinafter Xiong, US 2019/0130312).
            Regarding to claim 1, Xiong discloses a method executed by one or more data processing systems, comprising: 
            generating an observation of an environment by a first agent process (para [0034], For a reinforcement learning episode, the agent interacts with the environment in discrete time steps. At each time, the agent observes the environment's state and picks an action based on a policy. The next time step, the agent receives a reward signal and a new observation. The value function is updated using the reward. This continues until a terminal state is reached);
            sending a first message  that includes the observation, by the first agent process and to a meta-agent process  (para [0034], For a reinforcement learning episode, the agent interacts with the environment in discrete time steps. At each time, the agent observes the environment's state and picks an action based on a policy. The next time step, the agent receives a reward signal and a new observation. The value function is updated using the reward. This continues until a terminal state is reached);
            receiving a second message that includes a goal, by the first agent process A reward function defines the goal for an agent. It takes in a state, or a state and the action taken at that state, and gives back a number called the reward, which tells the agent how good it is to be in that state);
           evaluating a plurality of actions, by the first agent process and based on the goal, to determine a selected action (para [0046], Reinforcement learning (e.g., the REINFORCE algorithm) is used to train the agent on a progression of task sets, beginning with a terminal task set and continuing with an intermediate task set and with a top task set, according to one implementation. The terminal task set is formulated by selecting a set of primitive actions from a library of primitive actions. The intermediate task set is formulated by making available the formulated terminal task set as the base task set of the intermediate task set); and 
           applying the selected action to the environment by the first agent process (para [0045],  Augmented flat policy (AFP) classifier 172 is trained to process the hidden representation when switch policy classifier 182 determines that the current task is to be executed by performing the primitive action, and select the primitive action from the library of primitive actions. Augmented flat policy classifier 172 outputs .pi..sub.aug(a|s, g) through AFP FC network 258 and AFP softmax activation layer 268. Action processor 192 implements one or more primitive actions 295 of the selected previously-learned task or the selected primitive action, based on the determination of switch policy classifier 182).
           Regarding to claim 2, Xiong discloses the method of claim 1, wherein the meta-agent process executes on a higher hierarchical level than the first agent process and is configured to communicate with and direct a plurality of agent processes including the first agent process (para [0029],  The disclosed novel framework for efficient multi-task reinforcement learning trains software agents to employ hierarchical policies that decide when to use a previously learned policy and when to learn a new skill. This enables agents to continually acquire new skills during different stages of training).
           Regarding to claim 3, Xiong discloses the method of claim 1, wherein the observation is a partial observation that is associated with only a portion of the environment (para [0037], Visual encoder 132 utilizes a convolutional neural network (CNN) trained to extract feature maps from an image 230 of an environment view of the agent, and encode the features maps in a visual representation).
           Regarding to claim 4, Xiong discloses the method of claim 14, wherein the first agent process is a reinforcement-learning agent (para [0034], reinforcement learning includes teaching a software agent how to behave in an environment by telling it how well it is doing. The reinforcement learning system includes a policy, a reward function, and a value function).
            Regarding to claim 5, Xiong discloses the method of claim 1, wherein the meta-agent process is a reinforcement-learning agent (para [0034], For a reinforcement learning episode, the agent interacts with the environment in discrete time steps. At each time, the agent observes the environment's state and picks an action based on a policy. The next time step, the agent receives a reward signal and a new observation. The value function is updated using the reward. This continues until a terminal state is reached).
            Regarding to claim 6, Xiong discloses the method of claim 1, wherein the meta-agent process defines the goal based on the observation and a global policy (para [0031],  global policy engine 122 also includes plan composer 152 for composing plans for complex tasks based on simpler ones which have human descriptions. Plan composer 152 includes instruction policy classifier 162 that manages communication between global policy and base policy, augmented flat policy classifier 172 which allows the global policy to directly execute actions, and switch policy classifier 182 that decides whether the global policy will primarily rely on the base policy or the augmented flat policy).
             Regarding to claim 7, Xiong discloses the method of claim 1, wherein the evaluation is also based on one or more local policies (para [0009],   This two-layer hierarchical policy design significantly improves the ability to discover complex policies which cannot be learned by flat policies. However, two-layer hierarchical policy design also makes some strict assumptions that limit its flexibility: a task's global policy cannot use a simpler task's policy as part of its base policies; and a global policy is assumed to be executable by only using local policies over specific options).
            Regarding to claim 8, Xiong discloses the method of claim 1, wherein the evaluation includes determining a predicted result and associated reward for each of the plurality of actions, and the selected action is the action with the greatest associated reward (para [0034], A value function tells an agent how much reward it will get following a specific policy starting from a specific state. It represents how desirable it is to be in a certain state. Since the value function isn't given to the agent directly, it needs to come up with a good estimate based on the reward it has received so far).
            Regarding to claim 9, Xiong discloses the method of claim 1, wherein the evaluation is performed by using a controller process to formulate the plurality of actions and using a critic process to identify a reward value associated with each of the plurality of actions (para [0034],   A reward function defines the goal for an agent. It takes in a state, or a state and the action taken at that state, and gives back a number called the reward, which tells the agent how good it is to be in that state. The agent's job is to get the biggest amount of reward it possibly can in the long run. If an action yields a low reward, the agent will probably take a better action in the future).
           Regarding to claim 10, Xiong discloses the method of claim 1, wherein the environment is physical hardware being monitored and controlled by at least the first agent process (para [0012], The disclosed technology reveals a hierarchical policy network, for use by a software agent running on a processor, to accomplish an objective that requires execution of multiple tasks, including a terminal policy learned by training the agent on a terminal task set, an intermediate policy learned by training the agent on an intermediate task set, and a top policy learned by training the agent on a top task set).
            Regarding to claim 11, Xiong discloses the method of claim 1, wherein the environment is one of a computer system, an electrical, plumbing, or air system, a heating, ventilation, and air conditioning system, a manufacturing system, a mail processing system, or a product transportation, sorting, or processing system (para [0030], FIG. 1 shows architecture 100 of a hierarchical task processing system for use by an agent to accomplish an objective that requires execution of multiple tasks. Hierarchical task processing system 112 includes a global policy engine 122 that learns the language grounding for both visual knowledge and policies).
             Regarding to claim 12, Xiong discloses the method of claim 1, wherein the first agent process is one of a plurality of agent processes each configured to communicate with and be assigned goals by the meta-agent process (para [0034],  reinforcement learning includes teaching a software agent how to behave in an environment by telling it how well it is doing. The reinforcement learning system includes a policy, a reward function, and a value function. A policy tells the agent what to do in a certain situation. A reward function defines the goal for an agent).
             Regarding to claim 13, Xiong discloses the method of claim 1, wherein the first agent process is one of a plurality of agent processes each configured to communicate with the meta-agent process and each of the other agent processes (para [0029],   The disclosed novel framework for efficient multi-task reinforcement learning trains software agents to employ hierarchical policies that decide when to use a previously learned policy and when to learn a new skill. This enables agents to continually acquire new skills during different stages of training. Each learned task corresponds to a human language description. Because agents can only access previously learned skills through these descriptions, the agent can provide a human-interpretable description of its choices).
             Regarding to claim 14, Xiong discloses a data processing system comprising at least a processor and accessible memory (figure 9 and para [0081], a hierarchical policy network, running on numerous parallel processors coupled to memory, for use by an agent running on a processor to accomplish an objective that requires execution of multiple tasks), configured to perform a method as in claim 1 above, therefore, is rejected by the same rationale.  
             Regarding to claim 15, Xiong discloses a non-transitory computer-readable medium encoded with executable instructions that, when executed, cause a data processing system to perform a method as in claim 1, therefore, is rejected by the same rationale.  


          
                                                            Conclusion
8.        Claims 1-15 are rejected.
9.     The prior arts made of record and not relied upon are considered pertinent to applicant's disclosure:
             Nagaraja et al. (US 2018/0260692) disclose a reinforcement learning processor specifically configured to train reinforcement learning agents in the AI systems by the way of implementing an application-specific instruction set.
            Van Seijen et al. (US 2018/0165603) disclose machine learning techniques, including decomposing single-agent reinforcement learning problems into simpler problems addressed by multiple agents.
             McCord et al. (US 2018/0121766) disclose a system and method for enhanced human/machine workforce management using reinforcement learning, comprising a reinforcement learning server that produces a partially-observable Markov chain model, and an optimization server that uses the partially-observable Markov chain model to select work items and assign them to contact center resources.

            Ring et al. (US 2016/0012338) disclose building a forecast for an autonomous agent. The building at least comprises assigning a selected parameter of the autonomous agent to a state value, adding a new policy to a set of policies where the new policy maps actions of the autonomous agent for optimizing the state value, and adding a new forecast to a set of forecasts where the forecast at least comprises a prediction of a next state of the autonomous agent following execution of the new policy.

10.       Any inquiry concerning this communication or earlier communications from the examiner should be directed to examiner NGA B NGUYEN whose telephone number is (571) 272-6796.  The examiner can normally be reached on Monday-Friday 7AM-5PM.
          Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Eric Stamber can be reached on (571) 272-6724.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
            Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information 



/NGA B NGUYEN/Primary Examiner, Art Unit 3683                                                                                                                                                                                                        
September 9, 2021