DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement(s) (IDS) were/was submitted on 3/26/2021. The information disclosure statement(s) have/has been considered by the examiner.
Response to Arguments
Applicant’s arguments, see Remarks/Arguments and amended claims, filed 6/29/2021, with respect to claims 1-20, have been fully considered.  Regarding the drawings objected to under 37 CFR 1.83(a), there are no replacement drawings to evaluate.  Therefore, the drawing objections are not withdrawn. Regarding the Claim Interpretations under 35 U.S.C. 112(f), the arguments are persuasive, and therefore the interpretations under 35 U.S.C. 112(f) are withdrawn. Regarding the rejection of claim 6 under 35 U.S.C. 112(b), for the term, "overfitting of the solution", the arguments are persuasive, and therefore the rejection of claim 6 under 35 U.S.C. 112(b) is withdrawn. Regarding the rejection of claims 1-20 under 35 U.S.C. § 102, and 35 U.S.C. § 103, the arguments have been fully considered and are persuasive. Therefore, the rejection of claims 1-20 under 35 U.S.C. § 102 and 103 has been withdrawn. However, upon further consideration, a new ground(s) of rejection is made in view of newly found prior art reference(s) ROSMAN et al., US 20200086863, and previously disclosed prior art reference(s) SHALEV-SHWARTZ and DEY. The grounds for rejection in view of amended claims are provided below.
Claim Objections
Claim 16 is objected to because of the following informalities: the claim is dependent upon itself. Examiner is interpreting the claim as dependent upon claim 15.  Appropriate correction is required.
Claim 1, 8, and 15 are objected to because of the following informalities: the claims recite the limitation “the plurality of states.” It appears there is a grammatical error, and the limitation should read, “the plurality of solutions.”
Drawings
The drawings are objected to under 37 CFR 1.83(a).  The drawings must show every feature of the invention specified in the claims.  Therefore, the missing text to clearly label the plurality of numbered boxes, not representing well-known icons, shown in FIG. 2, FIG. 3, FIG. 5 and FIG. 6 must be shown or the feature(s) canceled from the claim(s).  No new matter should be entered.
Corrected drawing sheets in compliance with 37 CFR 1.121(d) are required in reply to the Office action to avoid abandonment of the application. Any amended replacement drawing sheet should include all of the figures appearing on the immediate prior version of the sheet, even if only one figure is being amended. The figure or figure number of an amended drawing should not be labeled as “amended.” If a drawing figure is to be canceled, the appropriate figure must be removed from the replacement sheet, and where necessary, the remaining figures must be renumbered and appropriate changes made to the brief description of the several views of the drawings for consistency. Additional replacement sheets may be necessary to show the renumbering of the remaining figures. Each drawing sheet submitted after the filing date of an application must be labeled in the top margin as either “Replacement Sheet” or “New Sheet” 
Claim Rejections - 35 USC § 112
Claims 1-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 1, 8, and 15 recite the limitation “the plurality of states.”  There is insufficient antecedent basis for this limitation in the claim. The Examiner interprets this limitation 
The term “actual future state” in claims 1, 8, and 15 is indefinite for failing to point out and distinctly claim the subject matter which the inventor or a joint inventor, or for pre-AIA  the applicant regards as the invention. It is not clear if  “actual future state” is relative to any chosen future state or if the “actual future state” is relative to a future state at any given time. 
The claims have been interpreted as best understood by the Examiner, using applicant specification, paragraph 4, “measuring the actual state of the agent at the selected future time.”
The dependent claims 2-7, 9-14, and 16-20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C 112 (pre-AIA ), second paragraph, as failing to resolve the deficiencies of the independent claims 1, 8, and 15.
 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-6, 8-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over ROSMAN et al., US 20200086863, herein further known as Rosman, in view of  SHALEV-SHWARTZ et al., US 20180032082, herein further known as Shalev-Shwartz.
Regarding claims 1, 8, and 15, Rosman discloses a method, system, and vehicle operating an autonomous vehicle (paragraph 16, autonomous driving systems, advanced driver assistance systems, or other systems within the vehicle process sensor data from the sensors to perceive aspects of the surrounding environment), comprising: receiving, at a processor operating hypothesis resolver, a plurality of solutions predicting a future state of an agent (paragraph 5, a prediction system (i.e. hypothesis resolver) for modeling dynamic agents in a surrounding environment of an ego vehicle is disclosed, including one or more processors receiving sensor data which include present observations of a road agent of the dynamic agents (i.e. plurality of solutions) that are being tracked in the surrounding environment, wherein the one or more processors estimate a future state of the road agent and the processors estimate a future state of the road agent using at least the present observations and the previous observations of the road agent to compute the future state according to a probabilistic model); receiving an environmental state at the hypothesis resolver (paragraphs 21-23, sensor data (therefore the prediction system 170 receives environmental state sensor data 250), see also at least FIG. 2); determining a reward for each of the plurality of states, the reward for a solution being based on the future state predicted by the solution and an actual future state, the reward indicating a confidence level of the solution for the environmental state (paragraphs 31-32, estimations of state for dynamic agents in the surrounding environment focus on the tracking module 230 and the model 260, tracking module 230 generally includes instructions that function to control the processor 110 to estimate a future state of a road agent, and tracking module 230 employs the model 260 to produce the estimation of the future state, and model 260 describes a predicted value of each action (a E A) given state (s E S) in terms of possible future rewards, a reward function R: S.times.A.fwdarw.R describes reward associated with taking action "a" when a current state is "s."  Model 260 is implemented according to principals of reinforcement learning to take reward structure into account, and explain behaviors of dynamic agents, see also at least FIG. 2 and FIG. 3).
However, Rosman does not explicitly disclose selecting a solution from the plurality of solutions having a highest reward for the environmental state and navigating the autonomous vehicle based on the selected solution.
Shalev-Shwartz teaches a method, system, and vehicle selecting a solution from the plurality of solutions having a highest reward for the environmental state (paragraphs 180-181, system may be trained through exposure to various navigational states, having the system (wherein the desirable behavior equates to a match of the predicted future state of the vehicle to an actual state of the vehicle at the predicted future point in time)  and based on the reward feedback, the system may "learn" the policy and becomes trained in producing desirable navigational actions, when deciding on what action to take, not only should the current reward be taken into account, but future rewards should also be considered, and the system should take a certain action, even though it is associated with a reward lower than another available option, when the system determines that in the future a greater reward may be realized  (i.e. highest reward) if the lower reward option is taken now) and navigating the autonomous vehicle based on the selected solution (paragraph 311-317, Robustness against adversarial environments may be useful in autonomous driving applications, and paragraph 317, the navigation system of the host vehicle (i.e. autonomous vehicle)  (e.g., through operation of driving policy module 803 within processing unit 110 of the navigation system) may select an action in response to an observed state (i.e. environmental state)).
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include the highest reward for the environmental state and navigating the autonomous vehicle based on the selected solution (combining with the reward system of Rosman) in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation.
Regarding claim 2, 9, and 16, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 1, 8, and 15 above.
he environmental state includes at least one of a weather condition, a traffic pattern (paragraph 2, to autonomously or at least semi-autonomously navigate through the surrounding environment, a machine estimates the state of the surrounding environment including dynamic objects (e.g., other vehicles, pedestrians, etc.) that are moving in relation to the machine and the surrounding environment represents a complex system of static and dynamic objects, and thus generating accurate estimations of the various situations (e.g., traffic situations) (i.e. traffic pattern) can be difficult), a traffic rule (paragraphs 56-57, tracking module 230 can use an observed agent to infer unseen properties of the surrounding environment such as the presence of other dynamic agents (e.g., pedestrians, other vehicles) or static aspects (e.g., traffic signs, etc.) (i.e. traffic rule) that influence behaviors of the dynamic agents) and a road condition.
Regarding claim 3, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 1 above.
Rosman discloses further a method comprising training the hypothesis resolver during a training mode (paragraph 21, prediction system 170 that is implemented to perform methods and other functions as disclosed herein relating to training (i.e. training mode) and implementing predictive models associated with estimating dynamic behaviors (e.g., likely movements and actions) of road agents) to associate rewards with each of the solutions for a selected environmental state  (paragraphs 31-32, the estimations of state for dynamic agents in the surrounding environment focus on the tracking module 230 and the model 260, tracking module 230 generally includes instructions that function to control the processor 110 to estimate a future state of a road agent, and tracking module 230 employs the model 260 to produce the estimation of the future state, and model 260 describes a predicted value of each action (a E A) given state .
Regarding claim 4, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 3 above.
However Rosman does not explicitly disclose a method comprising training the hypothesis resolver during a training mode by predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state, and assigning the reward to the solution based on the error.
Shalev-Shwartz teaches a method comprising training the hypothesis resolver during a training mode by predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state (paragraph 212, in the forward loop of the network, s.sub.t+1 may be replaced by the actual value of s.sub.t+1 (i.e. measuring the actual state of the agent at the selected future time), therefore eliminating the problem of error accumulation. The role of prediction of s.sub.t+1 is to propagate messages from the future back to past actions (i.e. determining an error for the solution). In this sense, the algorithm may be a combination of "model-based" reinforcement learning with "policy-based learning), and assigning the reward to the solution based on the error (paragraph 213, an important element that may be provided in some scenarios is a .
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state, and assigning the reward to the solution based on the error in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation.
Regarding claim 5, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 4 above.
However Rosman does not explicitly disclose a method wherein the reward is inversely proportional to the error.
Shalev-Shwartz teaches a method wherein the reward is inversely proportional to the error (paragraph 157, look-ahead time may be inversely proportional to the gain (i.e. reward) of one or more control loops associated with causing a navigational response in vehicle 200, such as the heading error tracking control loop). 
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include the reward is inversely proportional to the error in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road 
Regarding claim 6, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 4 above.
However, Rosman does not explicitly disclose a method reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward.
Shalev-Shwartz teaches a method comprising reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward (paragraph 180-181, the system should take a certain action, even though it is associated with a reward lower (i.e. reducing the reward) than another available option, when the system determines that in the future a greater reward may be realized if the lower reward option is taken now, (wherein the selection of the lower reward counteracts the tendency to only select the highest reward)).
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation.
Regarding claim 10, and 17  the combination of Rosman and Shalev-Shwartz disclose all elements of claim 8 and 16 above including training the hypothesis resolver during a training mode to associate rewards with each of the plurality of solutions for a selected environmental state (paragraphs 21-23, sensor data 250 includes scan data that embodies observations of a surrounding environment of the vehicle 100, vehicle 100 includes prediction system 170 (i.e. hypothesis resolver)  which includes a memory 210 that stores an input module 220, a tracking module 230, and a training module 270, input module 220 controls the respective sensors of the sensor system 120 to provide the data inputs in the form of sensor data 250 (therefore the prediction system 170 receives environmental state sensor data 250), see also at least FIG. 2, AND paragraphs 31-32, the estimations of state for dynamic agents in the surrounding environment focus on the tracking module 230 and the model 260, tracking module 230 generally includes instructions that function to control the processor 110 to estimate a future state of a road agent, and tracking module 230 employs the model 260 to produce the estimation of the future state, and model 260 describes a predicted value of each action (a E A) given state (s E S) in terms of possible future rewards, a reward function R: S.times.A.fwdarw.R describes a reward associated with taking action "a" when a current state is "s."  Model 260 is implemented according to principals of reinforcement learning to take reward structure into account, and explain behaviors of dynamic agents, see also at least FIG. 2 and FIG. 3).
Rosman discloses further a system comprising a neural network 
Regarding claim 11, and 18, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 10, and 17 above including the neural network  (paragraph 76, the simulated data is generated using the model 260 and/or another simulation system (e.g., generative neural network) AND paragraph 103, one or more of the modules described herein can include artificial or computational intelligence elements, e.g., neural network, fuzzy logic or other machine learning algorithms).
However, Rosman does not explicitly disclose wherein the neural network trains the hypothesis resolver during the training mode by predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state, and assigning the reward to the solution based on the error.
Shalev-Shwartz teaches a system and vehicle which trains the hypothesis resolver during the training mode by predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state (paragraph 212, in the forward loop of the network, s.sub.t+1 may be replaced by the actual value of s.sub.t+1 (i.e. measuring the actual state of the agent at the selected future time), therefore eliminating the problem of error accumulation. The role of prediction of s.sub.t+1 is to propagate messages from the future back to past actions (i.e. determining an error for the solution). In this sense, the algorithm may be a combination of "model-based" reinforcement learning with "policy-based learning), and assigning the reward to the solution based on the error (paragraph 213, an important element that may be provided in some .
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include training the hypothesis resolver during the training mode by predicting a state of the agent at a selected future time for a solution and the received environmental state, measuring the actual state of the agent at the selected future time, determining an error for the solution based on predicted state and the actual state, and assigning the reward to the solution based on the error in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation.
Regarding claim 12, and 19 the combination of Rosman and Shalev-Shwartz disclose all elements of claim 11, and 18 above.
However, Rosman does not explicitly disclose a system and vehicle wherein the reward is inversely proportional to the error.
Shalev-Shwartz teaches a system and vehicle wherein the reward is inversely proportional to the error  (paragraph 157, look-ahead time may be inversely proportional to the gain (i.e. reward) of one or more control loops associated with causing a navigational response in vehicle 200, such as the heading error tracking control loop).
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include the reward is inversely proportional to the error in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road 
Regarding claim 13, and 20 the combination of Rosman and Shalev-Shwartz disclose all elements of claim 11 and 18 above.
However, Rosman does not explicitly disclose a system and vehicle wherein reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward.
Shalev-Shwartz teaches a system and vehicle wherein comprising reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward (paragraph 180-181, the system should take a certain action, even though it is associated with a reward lower (i.e. reducing the reward) than another available option, when the system determines that in the future a greater reward may be realized if the lower reward option is taken now, (wherein the selection of the lower reward counteracts the tendency to only select the highest reward)).
Therefore, from the teaching of Shalev-Shwartz it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman, to include reducing the reward of the solution having the highest reward for an environmental state to counteract a tendency to only select the solution having the highest reward in order to navigate alongside other vehicles, avoid obstacles and pedestrians, observe traffic signals and signs, travel from one road to another road at appropriate intersections or interchanges, and respond to any other situation that occurs or develops during the vehicle's operation.
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Rosman and Shalev-Shwartz, further in view of DEY et al., US 20200151599, herein further known as Dey.
Regarding claim 7, and 14, the combination of Rosman and Shalev-Shwartz disclose all elements of claim 4 and 11 above.
However, Rosman does not explicitly disclose a method wherein the error is determined from a Euclidean distance between the predicted state and the actual state.
Dey teaches a method wherein the error is determined from a Euclidean distance between the predicted state and the actual state (paragraph 6, method comprises: capturing, by one or more hardware processors, a plurality of sequential actions depicting a pattern of time series via a two-stage modelling technique, wherein each of the plurality of sequential actions corresponds to the autonomous learning agent; deriving, based upon the plurality of sequential actions captured, one or more datasets comprising a plurality of predicted and actual actions of the autonomous learning agent by a Hierarchical Temporal Memory (HTM) modelling technique; extracting, using each of the plurality of predicted and actual actions, a set of prediction error values by a Euclidean Distance technique).
Therefore, from the teaching of Dey it would have been obvious to one of ordinary skill in the art at the time the invention was filed to modify Rosman to include the error is determined from a Euclidean distance between the predicted state and the actual state in order to learn sequences of actions and use any inherent patterns present therein to correctly predict future courses of actions.
Conclusion
THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a). Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Terry Buse whose telephone number is (313)446-6647.  The examiner can normally be reached on Monday - Friday 7-5PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, John Olszewski can be reached on (571) 272-2706.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/T.C.B./            Examiner, Art Unit 3669                                                                                                                                                                                            
/NICHOLAS K WILTEY/            Primary Examiner, Art Unit 3669