DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
The Examiner attempted to contact Applicant’s representative but the call was not returned within a reasonable time to schedule an interview under AFCP 2.0.
The applicant argues “Yester does not teach or suggest "calculating a lane change route in response to a map data, an episode, and the lane change control signal," "a vehicle controller executing a lane change in response to the lane change route" and "generating a score in response to a lane change execution and for updating the episode in response to the score and the lane change route" as recited by the currently amended claim 1.” The examiner respectfully disagrees; these limitations were rejected using Yester in combination with Graepel as detailed in the Final Action dated 1/12/2021, see pages 5-7, excerpted below:
a third processor for calculating a lane change route in response to a map data…and the lane change control signal (Yester: paragraphs 30 & 33)
an episode (Graepel Paragraph 68: “The system then trains the RL policy neural network on the training observations in the episode to adjust the values of the parameters using the long-term reward, e.g., by computing policy gradient updates and adjusting the values of the parameters using those policy gradient updates using a reinforcement learning technique”)
a vehicle controller executing a lane change in response to the episode (Graepel Paragraph 16: “agent is a control system for a mechanical agent interacting with the real-world environment. For example, the agent may be a control system integrated in an autonomous or semi-autonomous vehicle navigating through the environment.” Further Paragraph 15: “Generally, the agent interacts with the environment in order to complete one or more objectives and the reinforcement learning system selects actions in order to maximize the objectives, as represented by numeric rewards received by the reinforcement learning system in response to actions performed by the agent.” Further Paragraph 68: “The system then trains the RL policy neural network on the training observations in the episode to adjust the values of the parameters using the long-term reward, e.g., by computing policy gradient updates and adjusting the values of the parameters using those policy gradient updates using a reinforcement learning technique”)
and the third processor is configured for generating a score in response to a lane change execution and for updating the episode in response to the score and the lane change route. (Graepel Paragraph 71-74: “In particular, the system trains the value neural network to generate a value score for a given state of the environment that represents the predicted long-term reward resulting from the environment being in the state by adjusting the values of the parameters of the value neural network. The system generates training data for the value neural network from the interaction of the agent with the simulated version of the environment… The training data includes training observations and, for each training observation, the long-term reward that resulted from the training observation. For example, the system can select one or more observations randomly from each episode of interaction and then associate the observation with the reward for the episode to generate the training data.” Further Paragraph 68 discloses policy updates and adjustment of parameters in response to the episode interpreted to correspond to updating the episode as it is recited at a high level of generality in the instant specification. Paragraph 16 discloses the operation of this method in controlling an autonomous vehicle.)
obvious in view of the combination of Yester and Graepel. Further, the amendment changing “in response to the episode” to “in response to the lane change route” is not significant enough to overcome the prior art of record. 
The applicant further argues that Graepel teaches a system that interacts with a real world environment while the claimed invention does not. It is unclear to the examiner how a system can cause a vehicle to make a lane change and update an episode, amongst the remaining features and limitations, without interacting with a real world environment.
The applicant further argues that “Graepel does not teach or suggest "calculating a lane change route in response to a map data, an episode, and the lane change control signal," "a vehicle controller executing a lane change in response to the lane change route" and "generating a score in response to a lane change execution and for updating the episode in response to the score and the lane change route" as recited by the currently amended claim 1.” Similarly to above, the examiner respectfully disagrees, see pages 5-7 of the previous Final Rejection.
The applicant further argues that “Takae does not teach or suggest "calculating a lane change route in response to a map data, an episode, and the lane change control signal," "a vehicle controller executing a lane change in response to the lane change route" and "generating a score in response to a lane change execution and for updating the episode in response to the score and the lane change route" as recited by the currently amended claim 1.”  These limitations were rejected over Yester in view of Graepel as detailed in the Final Action dated 1/12/2021, the examiner respectfully disagrees that Takae would teach no part of these limitations, however the rejection under 103 was made using the prior art of record, see pages 5-7 of the Final Rejection.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ELIJAH W. VAUGHAN whose telephone number is (571)272-5424.  The examiner can normally be reached on M-Th 0730-1730.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Aniss Chad can be reached on 5712703832.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/E.W.V./               Examiner, Art Unit 3662          

/ANISS CHAD/               Supervisory Patent Examiner, Art Unit 3662