DETAILED ACTION
This action is in response to the claims filed 04/22/2022 for application 16/404,733. Claims 1, 6, 11, 16, and 20 have been amended, claims 4 and 14 have been canceled, and claims 21 and 22 are new. Claims 1-3, 5-13, and 15-22 are currently pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 04/22/2022 has been entered.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 8-13, and 18-22 are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi et al. ("Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning", hereinafter "Nagabandi") in view of Viswanathan ("US 20200111011 A1", hereinafter "Viswa") and further in view of Amini et al. ("Learning Steering Bounds for Parallel Autonomous Systems", hereinafter "Amini").


Regarding claim 1, Nagabandi teaches A computer-implemented method of training dynamic models, comprising: 
a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa fails to explicitly teach extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features, and selecting a value from each of the plurality of equally-spaced ranges for the feature;
Amini teaches extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features (“We define m discrete bins for steering sampled from a tunable logit function projected onto the x-axis on the interval [−1, 1]. Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles. Since the vast majority of driving data has steering angles within a small interval around the equilibrium of steering, having a discretization where bin angles are concentrated near the center of steering enables more precise classification when going straight or making small turns, while receiving more spread on larger turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; See further: For example, setting γ = 0 yields βj = π/2 vj , which is a linear function of bins (evenly spaced bins).]), and selecting a value from each of the plurality of equally-spaced ranges for the feature (“An NVIDIA DGX-1 supercomputer was used for training and validation of the DNNs. While training, a random minibatch of size 20 was randomly aggregated and fed through the network. Training frames were sampled from our dataset such that an approximately equal number of examples from each bin were fed through the network, to reduce bias towards any particular steering direction” [pg. 4721, § IV. Results, ¶2]);
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

 Regarding claim 2, Nagabandi/Viswa/Amini teaches The method of claim 1, where Nagabandi further teaches further comprising iteratively performing retrieving the second set of training data and retraining the dynamic model, until the corresponding performance score is above the predetermined threshold or a number of iterations reaches a predetermined iteration value (“Note that during retraining, the neural network dynamics function’s weights are warm-started with the weights from the previous iteration. The algorithm continues alternating between training the model and gathering additional data until a predefined maximum iteration is reached. We evaluate design decisions related to data aggregation in our experiments” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2]).

Regarding claim 3, Nagabandi/Viswa/Amini teaches The method of claim 1, where	Amini teaches wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn (“Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; note: BRI of the claim requires only at least one of the recited parameters.]).
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Regarding claim 8, Nagabandi/Viswa/Amini teaches of The method of claim 1, where Nagabandi further teaches wherein the dynamic model is one of a plurality of dynamic models trained using the first set of training data from the training data source (“We described a number of important design decisions for effectively and efficiently training neural network dynamics models, and we presented detailed experiments that evaluated these design parameters. Our method quickly discovered a dynamics model that led to an effective gait.” [pg. 7, § VII. Discussion, ¶2]), and wherein the dynamic model is a model that receives a highest score based on inference performance (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and halfcheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1; See further Nagabandi discloses on pg. 2, § III Preliminaries, ¶2; “In model-based reinforcement learning, a model of the dynamics is used to make predictions” implies inference performance.]]).

Regarding claim 9, Nagabandi/Viswa/Amini teaches The method of claim 8, where Nagabandi further teaches wherein the dynamic model is a neural network model represented by one of a linear regression, a multilayer perceptron (MLP), or a recurrent neural network (RNN) (“In this work, we demonstrate that multi-layer neural network models can in fact achieve excellent sample complexity in a model-based reinforcement learning algorithm, when combined with a few important design decisions such as data aggregation.” [pg. 1, § I. Introduction, ¶2; this would correspond to a multilayer perceptron.]).

Regarding claim 10, Nagabandi/Viswa/Amini teaches The method of claim 1, where Amini teaches wherein the training data source stores driving statistics collected from a variety of vehicles driven by human drivers (“Figure 5 (left) illustrates actual human steering control inputs (red plot) over a 40 second period, overlaid on the predicted steering bounds, and shows the human inputs operating within the predicted bounds.” [pg. 4721, IV. Results, B., ¶1]), wherein the driving statistics include information indicating driving commands issued and responses of the vehicles captured by sensors of the vehicles at different points in time (“The vehicle base platform used for this study is a Toyota Prius 2015 V, which was retrofitted with sensors, power, and computing systems for parallel and autonomous driving. A forward facing PointGrey Grasshoper 3 camera (60◦ field of view lens), which captures RGB images at approximately 20Hz, is the vision data source for this study. Sensors also collected steering wheel angle and encoder clicks, which are used to infer speed. The dataset was collected largely in the Boston metropolitan area across a variety of driving scenarios, including urban, suburban, and highway environments, and at varying times of the day and week. The dataset, approximately 7 hours (500 GB) of driving data in total, was split into training and testing portions with some of the training dataset used for validation. The video is sampled at 10Hz to expunge consecutive frames which are too similar to each other. Additionally, to filter out points where the car is not moving, we only consider frames when the vehicle is moving faster than 2 miles per hour” [pg. 4721, § IV. Results, A, ¶1]).
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Regarding claim 11, Nagabandi teaches A non-transitory machine-readable medium having instructions stored therein, which when executed by a processor, causing the processor to perform operations of training dynamic models (“From the experiments shown in this paper, our method has shown applicability for systems with high-dimensional state spaces, systems with contact-rich environment dynamics, under-observed systems, and systems with complex nonlinear dynamics that provide a considerable modelling challenge. In addition to taking communication delays and computational limitations into account” [pg. 7, § VII. Discussion, ¶5; implies use of processors and memory.]), the operations comprising: 
a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa fails to explicitly teach extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features, and selecting a value from each of the plurality of equally-spaced ranges for the feature;
Amini teaches extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features (“We define m discrete bins for steering sampled from a tunable logit function projected onto the x-axis on the interval [−1, 1]. Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles. Since the vast majority of driving data has steering angles within a small interval around the equilibrium of steering, having a discretization where bin angles are concentrated near the center of steering enables more precise classification when going straight or making small turns, while receiving more spread on larger turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; See further: For example, setting γ = 0 yields βj = π/2 vj , which is a linear function of bins (evenly spaced bins).]), and selecting a value from each of the plurality of equally-spaced ranges for the feature (“An NVIDIA DGX-1 supercomputer was used for training and validation of the DNNs. While training, a random minibatch of size 20 was randomly aggregated and fed through the network. Training frames were sampled from our dataset such that an approximately equal number of examples from each bin were fed through the network, to reduce bias towards any particular steering direction” [pg. 4721, § IV. Results, ¶2]);
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Regarding claim 12, Nagabandi/Viswa/Amini teaches The non-transitory machine-readable medium of claim 11, where Nagabandi further teaches further comprising iteratively performing retrieving the second set of training data and retraining the dynamic model, until the corresponding performance score is above the predetermined threshold or a number of iterations reaches a predetermined iteration value (“Note that during retraining, the neural network dynamics function’s weights are warm-started with the weights from the previous iteration. The algorithm continues alternating between training the model and gathering additional data until a predefined maximum iteration is reached. We evaluate design decisions related to data aggregation in our experiments” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2]).

Regarding claim 13, Nagabandi/Viswa/Amini teaches The non-transitory machine-readable medium of claim 11, where Amini teaches teach where wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn (“Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; note: BRI of the claim requires only at least one of the recited parameters.]).
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Regarding claim 18, Nagabandi/Viswa/Amini teaches of The non-transitory machine-readable medium of claim 11, where Nagabandi further teaches wherein the dynamic model is one of a plurality of dynamic models trained using the first set of training data from the training data source (“We described a number of important design decisions for effectively and efficiently training neural network dynamics models, and we presented detailed experiments that evaluated these design parameters. Our method quickly discovered a dynamics model that led to an effective gait.” [pg. 7, § VII. Discussion, ¶2]), and wherein the dynamic model is a model that receives a highest score based on inference performance (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and halfcheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1; See further Nagabandi discloses on pg. 2, § III Preliminaries, ¶2; “In model-based reinforcement learning, a model of the dynamics is used to make predictions” implies inference performance.]]).

Regarding claim 19, Nagabandi/Viswa/Amini teaches The non-transitory machine-readable medium of claim 18, where Nagabandi further teaches wherein the dynamic model is a neural network model represented by one of a linear regression, a multilayer perceptron (MLP), or a recurrent neural network (RNN) (“In this work, we demonstrate that multi-layer neural network models can in fact achieve excellent sample complexity in a model-based reinforcement learning algorithm, when combined with a few important design decisions such as data aggregation.” [pg. 1, § I. Introduction, ¶2; this would correspond to a multilayer perceptron.]).

Regarding claim 20, Nagabandi teaches A data processing system, comprising: a processor; and a memory coupled to the processor to store instructions, which when executed by a processor, cause the processor to perform operations of training dynamic models (“From the experiments shown in this paper, our method has shown applicability for systems with high-dimensional state spaces, systems with contact-rich environment dynamics, under-observed systems, and systems with complex nonlinear dynamics that provide a considerable modelling challenge. In addition to taking communication delays and computational limitations into account” [pg. 7, § VII. Discussion, ¶5; implies use of processors and memory.]), the operations comprising: 
a first set of training data from a training data source (“We collect training data by sampling starting configurations s0 ∼ p(s0), executing random actions at each timestep, and recording the resulting trajectories τ = (s0, a0, · · · , sT −2, aT −2, sT −1) of length T… ” [pg. 3, § Collecting training data, ¶1; training data source would correspond to the robots disclosed by Nagabandi.]), 
training a dynamic model based on the first set of training data for the first set of features (“Training the model: We train the dynamics model fθ(st, at) by minimizing the error… While training on the training dataset D, we also calculate the mean squared error in Eqn. 2 on a validation set Dval, composed of trajectories not stored in the training dataset” [pg. 3, § Training the model, ¶3; Training set D would correspond to a first set of training data.]); 
and for each of the second set of features, retrieving a second set of training data associated with the corresponding feature of the second set of features (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; Examiner is interpreting DRL to be equivalent to a second set of training data.]), and 
retraining the dynamic model using the second set of training data (“To improve the performance of our model-based learning algorithm, we gather additional on-policy data by alternating between gathering data with our current model and retraining our model using the aggregated data.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶1]).
However Nagabandi fails to explicitly teach for autonomous driving vehicles (ADVs)
the first set of training data representing driving statistics for a first set of features;
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold;
Viswa teaches for autonomous driving vehicles (ADVs) (“As discussed above, the various embodiments described herein relate broadly to autonomous driving, and specifically to vehicle positioning using sensor data” [¶0031])
the first set of training data representing driving statistics for a first set of features (“For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below).” [¶0048; sensor inputs would be equivalent to driving statistics.]);
determining a second set of features as a subset of the first set of features based on comparing an actual future state of each feature of the first set of features from the dynamic model and an expected future state of the feature from the dynamic model, each of the second set of features representing a feature whose performance score is below a predetermined threshold (“In step 403, the mapping platform 117 trains the machine learning model 115 using the ground truth sensor data to calculate a predicted sensor error from a set of input features. For example, the set of input features is extracted from sensor data subsequently collected from a geographic location for which the predicted sensor error for a target sensor is to be calculated (e.g., as described with respect to FIG. 6 below). In one embodiment, the training module 303 can train the machine learning model 115 (e.g., a neural network, support vector machine, or equivalent) by obtaining a feature vector or matrix comprising the selected training features from the feature extraction module 301. During the training process, the training module 303 feeds the feature vectors or matrices of the training data set (e.g., the ground truth data) into the machine learning model 115 to compute a predicted sensor error. The training module 303 then compares the predicted sensor error to the ground truth sensor error values of the ground truth training data set. Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error. In other words, the “trained” machine learning model 115 is a model whose parameters are adjusted to make accurate predictions with respect to the ground truth data. The trained machine learning model 115 can then be used as according to the embodiments described below in FIG. 6.” [¶0048; Examiner is interpreting the predicted sensor error to be equivalent to “an actual future state of each feature” and ground truth training data set to be equivalent to “an expected future state of the feature”.]);
Nagabandi and Viswa are both in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s dynamic model to further implement training an autonomous driving vehicle as taught by Viswa. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa fails to explicitly teach extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features, and selecting a value from each of the plurality of equally-spaced ranges for the feature;
Amini teaches extracting, wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features (“We define m discrete bins for steering sampled from a tunable logit function projected onto the x-axis on the interval [−1, 1]. Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles. Since the vast majority of driving data has steering angles within a small interval around the equilibrium of steering, having a discretization where bin angles are concentrated near the center of steering enables more precise classification when going straight or making small turns, while receiving more spread on larger turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; See further: For example, setting γ = 0 yields βj = π/2 vj , which is a linear function of bins (evenly spaced bins).]), and selecting a value from each of the plurality of equally-spaced ranges for the feature (“An NVIDIA DGX-1 supercomputer was used for training and validation of the DNNs. While training, a random minibatch of size 20 was randomly aggregated and fed through the network. Training frames were sampled from our dataset such that an approximately equal number of examples from each bin were fed through the network, to reduce bias towards any particular steering direction” [pg. 4721, § IV. Results, ¶2]);
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Regarding claim 21, Nagabandi/Viswa/Amini teaches The data processing system of claim 20, where Nagabandi further teaches the operations further comprising iteratively performing retrieving the second set of training data and retraining the dynamic model, until the corresponding performance score is above the predetermined threshold or a number of iterations reaches a predetermined iteration value (“Note that during retraining, the neural network dynamics function’s weights are warm-started with the weights from the previous iteration. The algorithm continues alternating between training the model and gathering additional data until a predefined maximum iteration is reached. We evaluate design decisions related to data aggregation in our experiments” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2]).

Regarding claim 22, Nagabandi/Viswa/Amini teaches The data processing system of claim 20, where Amini teaches teach where wherein each of the first set of features represents one of a plurality of driving parameters, including speed, accelerator, angular velocity, throttle, brake, and steering angle, U-turn, left turn, or right turn (“Having the ability to tune our discretization affords more flexibility on how angles are binned over the space of all turning angles.” [pg. 4719, § III. Learning Steering Distributions, ¶3; note: BRI of the claim requires only at least one of the recited parameters.]).
Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s teachings to implement binning and then sampling from the bins as taught by Amini. One would have been motivated to make this modification in order to compute a number of different possible actions and to incorporate higher-level decision making for autonomous navigation. [Abstract, Amini]

Claims 5-6 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi in view of Viswa and Amini and further in view of Wang et al. ("Deep Reinforcement Learning for Autonomous Driving", hereinafter "Wang").

Regarding claim 5, Nagabandi/Viswa/Amini teaches The method of claim 1, however fails to explicitly teach wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features 
Wang teaches wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 6, Nagabandi/Viswa/Amini teaches The method of claim 1, where Nagabandi further teaches wherein the dynamic model is evaluated for which the dynamic model has been trained (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and half-cheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1])
Wang teaches based on driving statistics generated (“We choose The Open Racing Car Simulator (TORCS) as our environment to train our agent. In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1 Introduction, ¶6; sensor inputs would be equivalent to driving statistics.]), under a plurality of controlled feature scenarios, by an ADV, each controlled feature scenarios representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 15, Nagabandi/Viswa/Amini teaches The non-transitory machine-readable medium of claim 11, however fails to explicitly teach wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features 
Wang teaches wherein the first set of training data includes a plurality of feature scenarios, each feature scenario representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Regarding claim 16, Nagabandi/Viswa/Amini teaches The non-transitory machine-readable medium of claim 11, where Nagabandi further teaches wherein the dynamic model is evaluated for which the dynamic model has been trained (“We first evaluate various design decisions for model-based reinforcement learning with neural networks using empirical evaluations with our model-based approach (Sec. IV). We explored these design decisions on the swimmer and half-cheetah agents on the locomotion task of running forward as quickly as possible. After each design decision was evaluated, we used the best outcome of that evaluation for the remainder of the evaluations.” [pg. 5, § A. Evaluating Design Decisions for Model-Based Reinforcement Learning, ¶1])
Wang teaches based on driving statistics generated (“We choose The Open Racing Car Simulator (TORCS) as our environment to train our agent. In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1 Introduction, ¶6; sensor inputs would be equivalent to driving statistics.]), under a plurality of controlled feature scenarios, by an ADV, each controlled feature scenarios representing a combination of selected values for the first set of features (“Meanwhile, the control problem is also challenging in real world because the action spaces is continuous and different action can be executed at the same time. For example, for smoother turning, We can steer and brake at the same time and adjust the degree of steering as we turn. More importantly, A safe autonomous vehicle must ensure functional safety and be able to deal with urgent events. For example, vehicles need to be very careful about crossroads and unseen corners such that they can act or brake immediately when there are children suddenly running across the road.” [pg. 2, 1. Introduction, ¶4; Examiner is interpreting actions being executed at the same time to be equivalent to be a combination of selected values.]).
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]

Claims 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Nagabandi in view of Viswa, Amini and Wang and further in view of Eraqi et al. ("End-to-End Deep Learning for Steering Autonomous Vehicles Considering Temporal Dependencies", hereinafter "Eraqi").

Regarding claim 7, Nagabandi/Viswa/Amini/Wang teaches The method of claim 6, where Wang further teaches further comprising: 
determining, from the plurality of controlled feature scenarios, a set of controlled feature scenarios associated with each feature of the first set of features (“TORCH provides 18 different types of sensor inputs. After experiments we carefully select a subset of inputs, which is shown in Table 1… ob.angle is the angle between the car direction and the direction of the track axis. It reveals the car’s direction to the track line. • ob.track is the vector of 19 range finder sensors: each sensor returns the distance between the track edge and the car within a range of 200 meters. It let us know if the car is in danger of running into obstacle. • ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0. • ob.speedX, ob.speedY, ob.speedZ is the speed of the car along the longitudinal axis of the car (good velocity), along the transverse axis of the car, and along the Z-axis of the car. We want the car speed along the axis to be high and speed vertical to the axis to be low.” [pg. 5, § 3.3 The Open Racing Car Simulator (TORCS)]); 
applying each of the set of controlled feature scenarios as input to the dynamic model (“In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1. Introduction, ¶6]); 
Nagabandi further teaches comparing an output of the dynamic model to the input against a ground truth value in response to the input (“We therefore calculate H-step validation errors by propagating the learned dynamics function forward H times to make multi-step openloop predictions. For each given sequence of true actions (at, . . . at+H−1) from Dval, we compare the corresponding ground-truth states (ŝt+1. . . ŝt+H) to the dynamics model’s multi-step state predictions (ŝt+1. . . ŝt+H) [pg. 3, right col, ¶2]);and
determining the second set of features based on their performance scores (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; See further Sec. IV-C discloses performance scores]).
Viswa further teaches computing a performance score for each feature of the first set of features (“Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error.” [¶0048])
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa/Wang fails to explicitly teach computing a root mean squared error for each feature of the first set of features;
Eraqi teaches computing a root mean squared error for each feature of the first set of features (“
    PNG
    media_image1.png
    140
    538
    media_image1.png
    Greyscale
” [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3]);
Nagabandi, Viswa, Amini, Wang, and Eraqi are all in the same field of endeavor of training machine learning models and thus are all analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. Eraqi discloses using a root mean square error method to express average system prediction error. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s/Wang’s teachings to implement a root mean square error evaluation step as taught by Eraqi. One would have been motivated to use RSME since large errors in prediction would be undesirable in training autonomous driving vehicles. [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3, Eraqi]

Regarding claim 17, Nagabandi/Viswa/Amini/Wang teaches The non-transitory machine-readable medium of claim 16, where Wang further teaches further comprising: 
determining, from the plurality of controlled feature scenarios, a set of controlled feature scenarios associated with each feature of the first set of features (“TORCH provides 18 different types of sensor inputs. After experiments we carefully select a subset of inputs, which is shown in Table 1… ob.angle is the angle between the car direction and the direction of the track axis. It reveals the car’s direction to the track line. • ob.track is the vector of 19 range finder sensors: each sensor returns the distance between the track edge and the car within a range of 200 meters. It let us know if the car is in danger of running into obstacle. • ob.trackPos is the distance between the car and the track axis. The value is normalized w.r.t. to the track width: it is 0 when the car is on the axis, values greater than 1 or -1 means the car is outside of the track. We want the distance to the track axis to be 0. • ob.speedX, ob.speedY, ob.speedZ is the speed of the car along the longitudinal axis of the car (good velocity), along the transverse axis of the car, and along the Z-axis of the car. We want the car speed along the axis to be high and speed vertical to the axis to be low.” [pg. 5, § 3.3 The Open Racing Car Simulator (TORCS)]); 
applying each of the set of controlled feature scenarios as input to the dynamic model (“In order to learn the policy in TORCS, We first select a set of appropriate sensor information as inputs from TORCS. Based on these inputs, we then design our own rewarder inside TORCS to encourage our agent to run fast without hitting other cars and also stick to the center of the road.” [pg. 2, § 1. Introduction, ¶6]); 
Nagabandi further teaches comparing an output of the dynamic model to the input against a ground truth value in response to the input (“We therefore calculate H-step validation errors by propagating the learned dynamics function forward H times to make multi-step openloop predictions. For each given sequence of true actions (at, . . . at+H−1) from Dval, we compare the corresponding ground-truth states (ŝt+1. . . ŝt+H) to the dynamics model’s multi-step state predictions (ŝt+1. . . ŝt+H) [pg. 3, right col, ¶2]);and
determining the second set of features based on their performance scores (“First, random trajectories are collected and added to dataset DRAND, which is used to train fθ by performing gradient descent on Eqn. 2. Then, the model-based MPC controller (Sec. IV-C) gathers T new on-policy datapoints and adds these datapoints to a separate dataset DRL.” [pg. 4, § D. Improving Model-Based Control with Reinforcement Learning, ¶2; See further Sec. IV-C discloses performance scores]).
Viswa further teaches computing a performance score for each feature of the first set of features (“Based on this comparison, the training module 303 computes an accuracy of the predictions or classifications for the initial set of model parameters. If the accuracy or level of performance does not meet a threshold or configured level, the training module 303 incrementally adjusts the model parameters until the machine learning model 115 generates predictions at the desired level of accuracy with respect to the predicted sensor error.” [¶0048])
Nagabandi, Viswa, Amini and Wang are all in the same field of endeavor of training machine learning models and thus are analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s teachings by substituting the training data with the combination of driving parameters taught by Wang. Nagabandi discusses deploying this dynamic model training method on real world robotic systems for future work and thus one would have been motivated to make this modification in order to train an autonomous driving vehicle to achieve better performance based off evaluating a dynamic model. [§ VII. Discussion, Nagabandi]
However Nagabandi/Viswa/Wang fails to explicitly teach computing a root mean squared error for each feature of the first set of features;
Eraqi teaches computing a root mean squared error for each feature of the first set of features (“
    PNG
    media_image1.png
    140
    538
    media_image1.png
    Greyscale
” [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3]);
Nagabandi, Viswa, Amini, Wang, and Eraqi are all in the same field of endeavor of training machine learning models and thus are all analogous. Nagabandi discloses training dynamic models to achieve better performance. Viswa teaches a method of predicting sensor error for autonomous vehicles. Amini teaches steering bounds for an autonomous driving task. Wang discloses training deep reinforcement learning of autonomous driving vehicles. Eraqi discloses using a root mean square error method to express average system prediction error. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Nagabandi’s/Viswa’s/Amini’s/Wang’s teachings to implement a root mean square error evaluation step as taught by Eraqi. One would have been motivated to use RSME since large errors in prediction would be undesirable in training autonomous driving vehicles. [pg. 5, § 5.1 Dataset and Evaluation Metrics, ¶3, Eraqi]


Response to Arguments
Applicant's arguments filed 04/22/2022 have been fully considered but they are not persuasive. 

Regarding the 35 U.S.C. §103 Rejection:
Applicant’s arguments on pgs. 11-13, in particular, the relied upon references failing to teach “wherein the extracting of the first set of training data from the training data source includes determining a plurality of equally-spaced value ranges for each of the first set of features, and selecting a value from each of the plurality of equally-spaced ranges for the feature” has been considered but are moot because the amended limitation is now taught by the newly presented art of Amini. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/ERIC NILSSON/Primary Examiner, Art Unit 2122