DETAILED ACTION
Claims 1-15 are pending and have been examined.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Information Disclosure Statement
The information disclosure statements (IDS) submitted on 03/26/2020, 07/02/2020 and 02/02/2021 are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statements are being considered by the examiner.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1 - 15 are rejected under 35 U.S.C. 103 as being unpatentable over Tuncali ("Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems") in view of Zhu ("Applying Formal Methods to Reinforcement Learning").

In regard to claims 1, 6 and 11, Tuncali teaches: A system for controlling a mobile platform, the system comprising: a mobile platform; and (Tuncali, p.1 abstract "This approach is demonstrated on a case study in which a Dubins car model of an autonomous vehicle [a mobile platform] is controlled by a neural network to follow a given path.")
one or more processors and a non-transitory computer-readable medium having executable instructions encoded thereon such that when executed, the one or more processors perform an operation of: (Tuncali, p.1 "We present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers… Self-driving cars, unmanned aerial vehicles, and certain kinds of robots are examples of autonomous cyber-physical systems (ACPS), that is, physical systems controlled by software that are envisioned to have no human operator"; p. 4 "The implementation of the NN is done in MATLAB®."; cyber-physical system, software, MATLAB inherently teaches processors, memory, instructions etc.)

    PNG
    media_image1.png
    244
    330
    media_image1.png
    Greyscale

training, based on a current state of the mobile platform, a neural network π that runs on the mobile platform; (Tuncali, p. 5 "In the next section, we present an example that demonstrates the above method to prove safety for an ACPS with an NN controller [NN running on the car]."; p. 8 "Considering the NN controller as a function, h, mapping its inputs d_err and θ_err [e.g. state] to its output u, where u is the input to the plant, the closed loop system dynamics can be defined as follows, where x denotes the system state vector [e.g. state]... To learn an NN controller... By starting with a random set of NN parameters, we performed direct policy search variant of reinforcement learning using a CMA-ES algorithm [8, 10] to find an optimal set of parameters (weights and biases) for the NN controller. For the direct policy search, we used the blue (piecewise-linear) path shown in Figure 4 as the target path on the x-y plane. The CMA-ES algorithm is used to optimize the NN parameters with the goal of minimizing the path following error [training NN]... cost function...")
… following training on the plurality of examples of states, selecting an action to be performed by the mobile platform in its environment; and (Tuncali, p. 6 "An overview of the closed-loop system is provided in Figure 2… u is the turn rate control, which we will refer to as steering control."; p. 8 "Considering the NN controller as a function, h... where u is the input to the plant, the closed loop system dynamics can be defined as follows... The NN takes distance and angle errors (d_err and θ_err) as inputs, and it outputs steering control u [a selected action]."; See Fig. 2 u is the steering control provided to the car model.) 

    PNG
    media_image2.png
    80
    247
    media_image2.png
    Greyscale

causing the mobile platform to perform the selected action in its environment. (Tuncali, p. 2 "We consider a plant model described as follows... ẋ = fp(x, u) Eq(1)... u... is the input to the plant... Eq(4)... Equation (4) represents a closed-loop model of the system, in the sense that it is a synchronous composition of the dynamical systems representing the plant model with the controller model to obtain an autonomous system model (i.e., a dynamical system with no exogenous inputs)."; in a close-loop model, a controller caused a plant/car to perform an action via u, which is an input to a plant/an autonomous car.)


    PNG
    media_image3.png
    35
    426
    media_image3.png
    Greyscale

    PNG
    media_image4.png
    66
    557
    media_image4.png
    Greyscale
Tuncali does not teach, but Zhu teches: periodically querying a Satisfiability Modulo Theories (SMT) solver (Zhu, p. 6, Algorithm 3; p. 5 "Safety Verification. Formally, we want to prove that the following formula is unsatisfiable... Neural Network Encoding Neural Network Encoding. We now show how to encode Cf (x,a) so that the above formula is solvable via... SMT solvers."; p. 1 "we use a constraint encoding of the neural network policy model and  an SMT solver to compute states that are nearby states visited by an expert trajectory"; 

Algorithm 3: wherein the SMT is queried at the foreach loop [periodically querying], for safety verification (if the query is UNSAT/unsatisfiable), in order to obtain states which are included in the trajectory D for training the neural network.)

    PNG
    media_image5.png
    45
    155
    media_image5.png
    Greyscale
capable of reasoning over non-linear activation functions (Zhu, p. 6 "Neural Network Encoding… We describe the encoding of input, fully-connected, ReLU and output layers… In a ReLU layer, xji = max {x, 0} … which can be encoded using constraints Ci = .... "; the solver is capable of reasoning over ReLU layers involving non-linear ReLU functions.) to obtain a plurality of examples of states satisfying specified constraints of the mobile platform; (Zhu, p. 6 Algorithm 3,  p. 1 "we use a constraint encoding of the neural network policy model" ;p. 5, "The core idea is that we encode a neural net policy f(x) = a, which chooses action a when given input state x, as constraints Cf (x; a)... Our work is based on recent advance on constraint-based encoding of neural networks that enables proofs of neural networks safety in our desire..."; p. 5, "The starting point of our verification algorithm is based on D, obtaine d from expert trajectories generation by Algorithm 2. For a given state x (a point in a vector space) from D..."; in Algorithm 3, D is the obtained examples/expert trajectories.)

training the neural network π on the plurality of examples of states; (Zhu, p.6 "Therefore Algorithm 3 can be considered as another variant of DAGGER as we train an improved policy on aggregated expert trajectories dataset."; Algorithm 3 "Train policy f on D;" p. 1 abstract "In particular, we looked at closed-loop control systems that incorporate neural network based reinforcement learning components... Our solution to this problem is inspired by Imitation Learning, a learning from demonstrations framework, in which an agent learns a control policy by directly mimicking demonstrations [e.g. examples] provided by an expert."; p. 2 "In this paper, we explore how to combine learning from demonstrations with classic reinforce ment learning. We would like to ensure that trained agents behave in a way that best mimics expert trajectories. In our context, expert trajectories focus on safety only...")

It would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to have modified Tuncali to include imitation learning. Doing so would allow an agent to learn a control policy by directly mimicking demonstrations provided by an expert. (Zhu, p. 1 abstract "Our solution to this problem is inspired by Imitation Learning, a learning from demonstrations framework, in which an agent learns a control policy by directly mimicking demonstrations provided by an expert.")

Claims 6 and 11 recite substantially the same limitation as claim 1, therefore the rejection applied to claim 1 also apply to claims 6 and 11. In addition, Tuncali teaches: (claim 11) A computer program product for controlling a mobile platform, the computer program product comprising:
computer-readable instructions stored on a non-transitory computer-readable medium that are executable by a computer having one or more processors for causing the processor to perform operations of: (Tuncali, p.1 "We present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers… Self-driving cars, unmanned aerial vehicles, and certain kinds of robots are examples of autonomous cyber-physical systems (ACPS), that is, physical systems controlled by software that are envisioned to have no human operator"; p. 4 "The implementation of the NN is done in MATLAB®."; cyber-physical system, software, MATLAB inherently teaches processors, memory, instructions etc.)

In regard to claims 2, 7 and 12, reference is made to the rejection of claims 1, 6 and 11 respectively, and further, Tuncali does not teach, but Zhu teaches: wherein the SMT solver is queried according to a query schedule. (Zhu, p. 6, Algorithm 3; p. 5 "Safety Verification. Formally, we want to prove that the following formula is unsatisfiable... Neural Network Encoding Neural Network Encoding. We now show how to encode Cf (x,a) so that the above formula is solvable via... SMT solvers."; Algorithm 3: wherein the SMT is queried at the foreach loop [querying schedule], for safety verification (if the query is UNSAT/unsatisfiable), in order to obtain states which are included in the trajectory D for training the neural network.)
The rationale for combining the teachings of Tuncali and Zhu is the same as set forth in the rejection of claim 1.

In regard to claims 3, 18 and 13, reference is made to the rejection of claims 1, 6 and 11 respectively, and further, Tuncali does not teach, but Zhu teaches: wherein the one or more processors further perform an operation of generating the plurality of examples of states utilizing the SMT solver by implementing a state space quantization algorithm. (Zhu, p.1 abstract "Unlike traditional methods, which eagerly expand a Monte Carlo search tree [a state space quantization] and draw samples on tree nodes, our technique is lazy and counterexample driven… For each violation of safety properties found by the simulator, we traverse the Monte Carlo search tree induced by the failed trajectory. This search procedure allows us to collect expert trajectories that satisfy our safety properties by... "; p. 3 "The goals of policy diagnosis are: (i) Automatically synthesize expert trajectories... Our formal methods based approach to policy diagnosis is essentially a variant of Monte Carlo method, which considers multiple trajectories of a game to find expert trajectories."; p. 4 "Algorithm 2: Policy diagnosis on discrete control program." p. 5 "Algorithm 2 discovers states in which the policy must take certain actions to avoid failures")
The rationale for combining the teachings of Tuncali and Zhu is the same as set forth in the rejection of claim 1.

In regard to claims 4, 9 and 14, reference is made to the rejection of claims 3, 8 and 13 respectively, and further, Tuncali does not teach, but Zhu teaches: wherein the one or more processors further perform an operation of applying at least one query constraint when generating the plurality of examples of states. (Zhu, p.1 "we use a constraint encoding of the neural network policy model and an SMT solver to compute states that are nearby states visited by an expert trajectory, but for which the policy chooses significantly different actions'")
The rationale for combining the teachings of Tuncali and Zhu is the same as set forth in the rejection of claim 1.

In regard to claims 5, 10 and 15, reference is made to the rejection of claims 1, 6 and 11 respectively, and further, Tuncali does not teach, but Zhu teaches: wherein the one or more processors further perform operations of: applying a processing algorithm to the plurality of examples of states, resulting in a set of processed examples of states; and training the neural network π on the set of processed examples of states. (Zhu, p. 6, "Algorithm 3: Using Policy Verification to improve safety of a policy f."; Algorithm 3 is a processing algorithm, D is a set of processed examples of states.; "Train policy f on D" is training NN on the set of examples.)
The rationale for combining the teachings of Tuncali and Zhu is the same as set forth in the rejection of claim 1.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SU-TING CHUANG whose telephone number is (408)918-7519.  The examiner can normally be reached on Monday - Thursday 8-5 PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571)272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/S.C./Examiner, Art Unit 2122                 

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122