DETAILED ACTION
This action is in response the claims filed 11/03/2021 for application 16/404,733 filed 05/06/2019. Claims 1, 9, 11, 19, and 20 are amended. Claims 1-20 are currently pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 10/25/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 2, 8, 9, 11, 12, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi et al. ("Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning", hereinafter "Nagabandi") in view of Viswanathan ("US 20200111011 A1", hereinafter "Viswa").

Regarding claim 1, Nagabandi teaches A computer-implemented method of training dynamic models, comprising: 
receiving a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 2, Nagabandi/Viswa teaches The method of claim 1, where Nagabandi further teaches further comprising iteratively performing retrieving the second set of training data and retraining the dynamic model, until the corresponding performance score is above the predetermined threshold or a number of iterations reaches a predetermined iteration value (“Note that during retraining, the neural network dynamics function’s weights are warm-started with the weights from the previous iteration. The algorithm continues alternating between training the model and gathering additional data until a predefined maximum iteration is reached. We evaluate design decisions related to data aggregation in our experiments” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2]).

Regarding claim 8, Nagabandi/Viswa teaches of The method of claim 1, where Nagabandi further teaches wherein the dynamic model is one of a plurality of dynamic models trained using the first set of training data from the training data source (“We described a number of important design decisions for effectively and efficiently training neural network dynamics models, and we presented detailed experiments that evaluated these design parameters. Our method quickly discovered a dynamics model that led to an effective gait.” [pg. 7, § VII. Discussion, ¶2]), and wherein the dynamic model is a model that receives a highest score based on inference performance (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and halfcheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1; See further Nagabandi discloses on pg. 2, § III Preliminaries, ¶2; “In model-based reinforcement learning, a model of the dynamics is used to make predictions” implies inference performance.]]).

Regarding claim 9, Nagabandi/Viswa teaches The method of claim 8, where Nagabandi further teaches wherein the dynamic model is a neural network model represented by one of a linear regression, a multilayer perceptron (MLP), or a recurrent neural network (RNN) (“In this work, we demonstrate that multi-layer neural network models can in fact achieve excellent sample complexity in a model-based reinforcement learning algorithm, when combined with a few important design decisions such as data aggregation.” [pg. 1, § I. Introduction, ¶2; this would correspond to a multilayer perceptron.]).

Regarding claim 11, Nagabandi teaches A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, causing the processor to perform operations of training dynamic models (“From the experiments shown in this paper, our method has shown applicability for systems with high-dimensional state spaces, systems with contact-rich environment dynamics, under-observed systems, and systems with complex nonlinear dynamics that provide a considerable modelling challenge. In addition to taking communication delays and computational limitations into account” [pg. 7, § VII. Discussion, ¶5; implies use of processors and memory.]), the operations comprising: 
receiving a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
Regarding claim 12, Nagabandi/Viswa teaches The non-transitory machine-readable medium of claim 11, where Nagabandi further teaches further comprising iteratively performing retrieving the second set of training data and retraining the dynamic model, until the corresponding performance score is above the predetermined threshold or a number of iterations reaches a predetermined iteration value (“Note that during retraining, the neural network dynamics function’s weights are warm-started with the weights from the previous iteration. The algorithm continues alternating between training the model and gathering additional data until a predefined maximum iteration is reached. We evaluate design decisions related to data aggregation in our experiments” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2]).

Regarding claim 18, Nagabandi/Viswa teaches of The non-transitory machine-readable medium of claim 11, where Nagabandi further teaches wherein the dynamic model is one of a plurality of dynamic models trained using the first set of training data from the training data source (“We described a number of important design decisions for effectively and efficiently training neural network dynamics models, and we presented detailed experiments that evaluated these design parameters. Our method quickly discovered a dynamics model that led to an effective gait.” [pg. 7, § VII. Discussion, ¶2]), and wherein the dynamic model is a model that receives a highest score based on inference performance (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and halfcheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1; See further Nagabandi discloses on pg. 2, § III Preliminaries, ¶2; “In model-based reinforcement learning, a model of the dynamics is used to make predictions” implies inference performance.]]).

Regarding claim 19, Nagabandi/Viswa teaches The non-transitory machine-readable medium of claim 18, where Nagabandi further teaches wherein the dynamic model is a neural network model represented by one of a linear regression, a multilayer perceptron (MLP), or a recurrent neural network (RNN) (“In this work, we demonstrate that multi-layer neural network models can in fact achieve excellent sample complexity in a model-based reinforcement learning algorithm, when combined with a few important design decisions such as data aggregation.” [pg. 1, § I. Introduction, ¶2; this would correspond to a multilayer perceptron.]).

Regarding claim 20, Nagabandi teaches A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by a processor, cause the processor to perform operations of training dynamic models (“From the experiments shown in this paper, our method has shown applicability for systems with high-dimensional state spaces, systems with contact-rich environment dynamics, under-observed systems, and systems with complex nonlinear dynamics that provide a considerable modelling challenge. In addition to taking communication delays and computational limitations into account” [pg. 7, § VII. Discussion, ¶5; implies use of processors and memory.]), the operations comprising: 
receiving a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Claims 3-6 and 13-16 are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi in view of Viswa and further in view of Wang et al. ("Deep Reinforcement Learning for Autonomous Driving", hereinafter "Wang").

Regarding claim 3, Nagabandi/Viswa teaches The method of claim 1, however fails to explicitly teach wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn
 Wang teaches wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn (“Here, we chose to take all sensor input listed in Table 1, make it a 29 dimension vector. The action of the model is a 3 dimension vector for Acceleration (where 0 means no gas, 1 means full gas), Brake (where 0 means no brake, 1 full brake) and Steering (where -1 means max right turn and +1 means max left turn) respectively.” [pg. 4, 3.2 Deep Deterministic Policy Gradient (DDPG), ¶1; note: BRI of the claim requires only at least one of the recited parameters.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 4, Nagabandi/Viswa teaches The method of claim 1, however fails to explicitly teach wherein extracting the first set of training data from the training data source includes: determining a plurality of equally-spaced value ranges for each of the first set of features; and selecting a value from each of the plurality of ranges for the feature.
Wang teaches wherein extracting the first set of training data from the training data source includes:
 determining a plurality of equally-spaced value ranges for each of the first set of features (“In autonomous driving, action spaces are continuous. For example, steering can vary from −90◦ to 90◦ and acceleration can vary from 0 to 300km.” [pg. 3, § 3 Methods, ¶1; See further: “ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0.” [pg. 5, 3.3 The Open Racing Car Simulator (TORCS); Examiner is interpreting “equally-spaced” to be equivalent to -1 to 0 (i.e. when the car is on the axis) then from 0 to 1]); and
 selecting a value from each of the plurality of ranges for the feature (“Here, we chose to take all sensor input listed in Table 1, make it a 29 dimension vector. The action of the model is a 3 dimension vector for Acceleration (where 0 means no gas, 1 means full gas), Brake (where 0 means no brake, 1 full brake) and Steering (where -1 means max right turn and +1 means max left turn) respectively.” [pg. 4, 3.2 Deep Deterministic Policy Gradient (DDPG), ¶1]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 5, Nagabandi/Viswa/Wang teaches The method of claim 4, where Wang further teaches wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 6, Nagabandi/Viswa/Wang teaches The method of claim 4, where Nagabandi further teaches wherein the dynamic model is evaluated for which the dynamic model has been trained (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and half-cheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1])
Wang further teaches based on driving statistics generated (“We choose The Open Racing Car Simulator (TORCS) as our environment to train our agent. In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1 Introduction, ¶6; sensor inputs would be equivalent to driving statistics.]), under a plurality of controlled feature scenarios, by an ADV, each controlled feature scenarios representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 13, Nagabandi/Viswa teaches The non-transitory machine-readable medium of claim 11, however fails to explicitly teach where wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn 
Wang teaches wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn (“Here, we chose to take all sensor input listed in Table 1, make it a 29 dimension vector. The action of the model is a 3 dimension vector for Acceleration (where 0 means no gas, 1 means full gas), Brake (where 0 means no brake, 1 full brake) and Steering (where -1 means max right turn and +1 means max left turn) respectively.” [pg. 4, 3.2 Deep Deterministic Policy Gradient (DDPG), ¶1; note: BRI of the claim requires only at least one of the recited parameters.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 14, Nagabandi/Viswa teaches The non-transitory machine-readable medium of claim 11, however fails to explicitly teach wherein extracting the first set of training data from the training data source includes: determining a plurality of equally-spaced value ranges for each of the first set of features; and selecting a value from each of the plurality of ranges for the feature.
Wang teaches wherein extracting the first set of training data from the training data source includes:
 determining a plurality of equally-spaced value ranges for each of the first set of features (“In autonomous driving, action spaces are continuous. For example, steering can vary from −90◦ to 90◦ and acceleration can vary from 0 to 300km.” [pg. 3, § 3 Methods, ¶1; See further: “ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0.” [pg. 5, 3.3 The Open Racing Car Simulator (TORCS); Examiner is interpreting “equally-spaced” to be equivalent to -1 to 0 (i.e. when the car is on the axis) then from 0 to 1]); and
 selecting a value from each of the plurality of ranges for the feature (“Here, we chose to take all sensor input listed in Table 1, make it a 29 dimension vector. The action of the model is a 3 dimension vector for Acceleration (where 0 means no gas, 1 means full gas), Brake (where 0 means no brake, 1 full brake) and Steering (where -1 means max right turn and +1 means max left turn) respectively.” [pg. 4, 3.2 Deep Deterministic Policy Gradient (DDPG), ¶1]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 15, Nagabandi/Viswa/Wang teaches The non-transitory machine-readable medium of claim 14, where Wang further teaches wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 16, Nagabandi/Viswa/Wang teaches The non-transitory machine-readable medium of claim 14, where Nagabandi further teaches wherein the dynamic model is evaluated for which the dynamic model has been trained (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and half-cheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1])
Wang further teaches based on driving statistics generated (“We choose The Open Racing Car Simulator (TORCS) as our environment to train our agent. In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1 Introduction, ¶6; sensor inputs would be equivalent to driving statistics.]), under a plurality of controlled feature scenarios, by an ADV, each controlled feature scenarios representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi in view of Viswa and Wang and further in view of Eraqi et al. ("End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies", hereinafter "Eraqi").

Regarding claim 7, Nagabandi/Viswa/Wang teaches The method of claim 6, where Wang further teaches further comprising: 
determining, from the plurality of controlled feature scenarios, a set of controlled feature scenarios associated with each feature of the first set of features (“TORCH provides 18 different types of sensor inputs. After experiments we carefully select a subset of inputs, which is shown in Table 1… ob.angle is the angle between the car direction and the direction of the track axis. It reveals the car’s direction to the track line. • ob.track is the vector of 19 range finder sensors: each sensor returns the distance between the track edge and the car within a range of 200 meters. It let us know if the car is in danger of running into obstacle. • ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0. • ob.speedX, ob.speedY, ob.speedZ is the speed of the car along the longitudinal axis of the car (good velocity), along the transverse axis of the car, and along the Z-axis of the car. We want the car speed along the axis to be high and speed vertical to the axis to be low.” [pg. 5, § 3.3 The Open Racing Car Simulator (TORCS)]); 
applying each of the set of controlled feature scenarios as input to the dynamic model (“In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1. Introduction, ¶6]); 
Nagabandi further teaches comparing an output of the dynamic model to the input against a ground truth value in response to the input (“We therefore calculate H-step validation errors by propagating the learned dynamics function forward H times to make multi-step openloop predictions. For each given sequence of true actions (at, . . . at+H−1) from Dval, we compare the corresponding ground-truth states (ŝt+1. . . ŝt+H) to the dynamics model’s multi-step state predictions (ŝt+1. . . ŝt+H) [pg. 3, right col, ¶2]);and
determining the second set of features based on their performance scores (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; See further Sec. IV-C discloses performance scores]).
Viswa further teaches computing a performance score for each feature of the first set of features (“Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error.” [¶0048])
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa/Wang fails to explicitly teach computing a root mean squared error for each feature of the first set of features;
Eraqi teaches computing a root mean squared error for each feature of the first set of features (“
    PNG
    media_image1.png
    140
    538
    media_image1.png
    Greyscale
” [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3]);
Nagabandi, Viswa, Wang, and Eraqi are all in the same field of endeavor of training machine learning models and thus are all analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. Eraqi discloses using a root mean square error method to express average system prediction error. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Wang’s teachings to implement a root mean square error evaluation step as taught by Eraqi. One would have been motivated to use RSME since large errors in prediction would be undesirable in training autonomous driving vehicles. [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3, Eraqi]

Regarding claim 17, Nagabandi/Viswa/Wang teaches The non-transitory machine-readable medium of claim 16, where Wang further teaches further comprising: 
determining, from the plurality of controlled feature scenarios, a set of controlled feature scenarios associated with each feature of the first set of features (“TORCH provides 18 different types of sensor inputs. After experiments we carefully select a subset of inputs, which is shown in Table 1… ob.angle is the angle between the car direction and the direction of the track axis. It reveals the car’s direction to the track line. • ob.track is the vector of 19 range finder sensors: each sensor returns the distance between the track edge and the car within a range of 200 meters. It let us know if the car is in danger of running into obstacle. • ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0. • ob.speedX, ob.speedY, ob.speedZ is the speed of the car along the longitudinal axis of the car (good velocity), along the transverse axis of the car, and along the Z-axis of the car. We want the car speed along the axis to be high and speed vertical to the axis to be low.” [pg. 5, § 3.3 The Open Racing Car Simulator (TORCS)]); 
applying each of the set of controlled feature scenarios as input to the dynamic model (“In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1. Introduction, ¶6]); 
Nagabandi further teaches comparing an output of the dynamic model to the input against a ground truth value in response to the input (“We therefore calculate H-step validation errors by propagating the learned dynamics function forward H times to make multi-step openloop predictions. For each given sequence of true actions (at, . . . at+H−1) from Dval, we compare the corresponding ground-truth states (ŝt+1. . . ŝt+H) to the dynamics model’s multi-step state predictions (ŝt+1. . . ŝt+H) [pg. 3, right col, ¶2]);and
determining the second set of features based on their performance scores (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; See further Sec. IV-C discloses performance scores]).
Viswa further teaches computing a performance score for each feature of the first set of features (“Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error.” [¶0048])
Nagabandi, Viswa, and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings by substituting the training data with the driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa/Wang fails to explicitly teach computing a root mean squared error for each feature of the first set of features;
Eraqi teaches computing a root mean squared error for each feature of the first set of features (“
    PNG
    media_image1.png
    140
    538
    media_image1.png
    Greyscale
” [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3]);
Nagabandi, Viswa, Wang, and Eraqi are all in the same field of endeavor of training machine learning models and thus are all analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Wang discloses training deep reinforcement learning of autonomous driving vehicles. Eraqi discloses using a root mean square error method to express average system prediction error. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Wang’s teachings to implement a root mean square error evaluation step as taught by Eraqi. One would have been motivated to use RSME since large errors in prediction would be undesirable in training autonomous driving vehicles. [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3, Eraqi]

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi in view of Viswa and further in view of Bojarski et al. ("End to End Learning for Self-Driving Cars", hereinafter "Bojarski").

Regarding claim 10, Nagabandi/Viswa teaches The method of claim 1, however the combination fails to explicitly teach wherein the training data source stores driving statistics collected from a variety of vehicles driven by human drivers, wherein the driving statistics include information indicating driving commands issued and responses of the vehicles captured by sensors of the vehicles at different points in time.
Bojarski teaches wherein the training data source stores driving statistics collected from a variety of vehicles driven by human drivers (“Data was acquired using either our drive-by-wire test vehicle, which is a 2016 Lincoln MKZ, or using a 2013 Ford Focus with cameras placed in similar positions to those in the Lincoln. The system has no dependencies on any particular vehicle make or model. Drivers were encouraged to maintain full attentiveness, but otherwise drive as they usually do. As of March 28, 2016, about 72 hours of driving data was collected.” [pg. 4, § 3 Data Collection, ¶2; note: Human driver is disclosed on pg. 2, § 1 Introduction, ¶3]), 
wherein the driving statistics include information indicating driving commands issued and responses of the vehicles captured by sensors of the vehicles at different points in time (“Figure 1 shows a simplified block diagram of the collection system for training data for DAVE-2. Three cameras are mounted behind the windshield of the data-acquisition car. Time-stamped video from the cameras is captured simultaneously with the steering angle applied by the human driver… Training data contains single images sampled from the video, paired with the corresponding steering command (1/r). Training with data from only the human driver is not sufficient. The network must learn how to recover from mistakes. Otherwise the car will slowly drift off the road. The training data is therefore augmented with additional images that show the car in different shifts from the center of the lane and rotations from the direction of the road.” [pg. 2, 2 Overview of the DAVE-2 System, ¶1-2]).
Nagabandi, Viswa, and Bojarski are all in the same field of endeavor of training machine learning models and thus are all analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Bojarski discloses training a dynamic model which is trained from human behaviors. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement driving statistics acquired from human drivers as taught by Bojarski. One would have been motivated to make this modification in order to avoid the need to recognize specific human designated features and reduce additional computational steps of the algorithm. [pg. 2, Introduction, ¶6, Bojarski]

Response to Arguments
Applicant's arguments filed 11/03/2021 have been fully considered but they are not persuasive. 


Regarding the 35 U.S.C. §103 rejections:
Applicant’s arguments on pgs. 11-13 regarding the cited prior arts of Nagabandi/Wang failing to teach “determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold” has been considered but are moot because that particular amended limitation is now taught by the newly presented art of Viswa. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122    

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122