Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
The Office Action is in response to the claims filed 01/19/2022. Claims 1-3 and 5-34 are presently pending and are presented for examination. 

Response to Arguments
Applicant's arguments, see pages 10-12, filed 01/19/2022, regarding the rejection of claims 1 and 17 under 35 U.S.C. § 103, have been fully considered but they are not persuasive. Applicant argues on page 11 that Shin does not teach “initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges, and wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways”. Applicant argues that Shin does not teach this element because the language of Shin recited in the claim states “[t]he node is the location of the robot or the artificial landmark at a specific timing” and “[t]he edge is a relationship between two nodes, and represents the spatial constraint between two nodes.” However, a relationship between two nodes representing a spatial constraint can, in fact, be a pathway between two nodes, and is taught as such in FIG. 3 of Shin, which illustrates a simultaneous localization and mapping (SLAM) method as a graph model, where each edge represents a relationship between two nodes including the travel distance of the robot between the nodes [paragraph 278]. Edges can include other relationships, such as an odometry edge calculated between consecutive nodes [paragraph 38], and they can include error, but the disclosure still reads on the claim language “a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges,” as each node can either represent turning points or continuation points along a path within the dynamic 
Applicant's arguments, see pages 12-16, filed 01/19/2022, regarding the rejection of claims 2-3, 5-16, and 18-29 under 35 U.S.C. § 103 have been fully considered but they are not persuasive. Applicant argues that claims 2-3, 5-16, and 18-29, which depend from claims 1 and 17, are patentable if claims 1 and 17 are patentable due to their dependency. However, since claims 1 and 17 are still rejected as explained above, claims 2-3, 5-16, and 18-29 are also still rejected for the reasons explained below. Similarly, applicants arguments, see pages 16-17, regarding the rejection of claims 31-34, are also not persuasive for the same reasons listed above.

Claim Rejections - 35 USC § 103
Claims 1-3, and 17-18 are rejected under 35 U.S.C. 102(a)(1) as being unpatentable by Balutis et al. US 20160174459 A1 (“Balutis”) in view of Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), and Johnson et al. US 20180060396 A1 (“Johnson”).
	Regarding Claim 1. Balutis teaches a method for adaptive path planning with respect to a dynamic environment, the method comprising: 
	determining, by global guidance logic executable by a processor of an adaptive path planning system, a planned path through one or more pathways of the dynamic environment from a start location to a selected destination for an agent automated vehicle (AV) (A method of operating robotic lawn mowers that operate on different lawns, which read on a dynamic environment. A sensor system, including GPS determines the location of the robot [paragraph 40]. The robot includes a controller that is coupled to a memory storage element that stores traversal routes [paragraph 31]. The controller as shown in FIG. 1C can process and execute at least some of the training data to move about the lawn areas and traversal regions and mow target lawn areas [paragraph 47]. A combination of manual training by a user or automatic path planning by subroutines programmed on the robot determines the routes, and the memory storage element can store data pertaining to the routes. A path planning algorithm can be used to find the shortest path from the robot’s current location to the destination [paragraph 70]); and
	controlling, by local planning logic executable by the processor of the adaptive path planning system of the local planning logic, dynamic interaction within the dynamic environment as the agent AV traverses at least a portion of the planned path (the sensor system of the robot can include a location estimation system [paragraph 38], which can use a time-of-flight between a boundary marker. For example, boundary markers may be placed along the boundary of the lawn. While the time-of-flight system uses markers that have been described as boundary markers, the marker can also be placed within or near the lawn to aid localization of the robot lawnmower. The localization can use triangulation to determine the robot position within the boundary. The signals sent between the boundary markers and the robot positioned on the property allow the robot to estimate the angles and the distance by calculating time of flight to each of the boundary markers, and using trigonometry to calculate the robot's current position [paragraph 38]. The sensor system is connected to a controller shown in FIG. 1C, which can process and execute at least some of the training data to move about the lawn areas and traversal regions and mow target lawn areas [paragraph 47]. The robot can use a path planning algorithm such as A*, rapidly exploring random trees (RRTs), or probabilistic roadmaps to find the shortest path from the robot’s current location to the destination, and can use these algorithms to drive between one are and the other (e.g., determine regions free of obstacles) [paragraph 70], which reads on controlling dynamic interaction with the dynamic environment by local planning logic of an adaptive path planning system).
	Balutis does not teach:
	the controlling dynamic interaction within the dynamic environment is based at least in part on utilizing localized deep reinforcement learning (DRL) of the local planning logic utilizing localized map sequence information.
	However, Jiang teaches:
	the controlling dynamic interaction within the dynamic environment comprises utilizing localized deep reinforcement learning (DRL) of the local planning logic utilizing localized map sequence information (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps. The system updates the DRL network at a frequency of every K time stamps [paragraph 92]. The system stores transition sets into a replay memory R. The transition sets of paragraph 92 are transitions of coordinates, including Xt, at, rt, and Xt+1, where t is a time stamp, t+1 is the next time stamp in the sequence of transition sets, and X is an x coordinate on a map [paragraph 87]. The robotic device samples a number of transition sets from a replay memory to compute a target y coordinate [paragraph 93], which means the robot utilizes localized map sequence information as part of the dynamic reinforcement learning utilized within controlling dynamic interaction within the dynamic environment).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with the controlling dynamic interaction within the dynamic environment comprises utilizing localized deep reinforcement learning (DRL) of the local planning logic utilizing localized map sequence information as taught by Jiang so as to allow the robotic system to adapt its path planning according to changes within the dynamic environment.
	Balutis also does not teach:
	wherein the determining the planned path comprises: 
	determining an initial global path connecting the start location with the selected destination; and 
	revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV.
	However, Smid teaches:
	wherein the determining the planned path comprises: 
	determining an initial global path connecting the start location with the selected destination; and 
	revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV (an autonomous vehicle controller with software for path planning, including adaptive systems that allow the AVC to self-tune based on external parameters and conditions as gathered by sensors of a sensor array [paragraph 27]. The systems self-tune in real time. The environmental sensor array is included in the controller to collect data regarding the vehicle speed, compass heading, absolute position (e.g., from GPS), or relative position [paragraph 18]. The environmental sensor array may also include a central processing unit and a memory/storage [paragraph 9], which means that the senor array stores the information it collects in the memory (forming a history) and uses this information to revise the global path and provide a planned path for traversal of the autonomous vehicle. The AVC includes programming of rules and programming for changing those rules [paragraph 28]. Programming of rules will typically include programming for following instructions provided by a user according to a protocol. Then, upon sensing of an external change of conditions, the AVC will typically include programming to change the protocol. Then, such changed protocol can be stored and used in the future such that the original rules or protocol has been changed. Original rules may map a path of waypoints to be directly followed by vehicle having the AVC and, upon sensing of an obstacle in the direct path between way points, the original rule of following a direct path can be modified to allow the vehicle to follow a path around the obstacle).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the determining the planned path comprises: determining an initial global path connecting the start location with the selected destination; and revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV as taught by Smid so as to allow the robot to adjust its path based on prior experiences, such as obstacle collision or issues with the travel region.
	Balutis also does not teach:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment.
	However, Shin teaches:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment (a robot with a global map estimating unit that estimates a global map and combines local maps to extend and update the global map [paragraph 289]. Some embodiments provide a map covering a wide region within a predetermined time under predetermined conditions (parameters) by generating a key frame to a node using scan information, calculating an odometry edge between continuous nodes and updating the key frames to estimate a local map, detecting a loop closure edge between non-continuous nodes relating to a set of updated key frames, and correcting the positions of the nodes based on the odometry edge and the loop closure edge to estimate a global map).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and	path planning is subsequent to initializing the dynamic environment as taught by Shin so as to allow the system to map out the travel paths of each robot and ensure that they comply with input parameters.
	Balutis also does not teach:
	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways.
	However, Johnson teaches:	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways (a method of mapping pathways in a database, wherein a pathway may have a length of L, which may be equal to the number of edges in it [paragraph 57]. This invention is intended to make graphs for technology have a means for weighting pathways in a network among edges, and so it would be obvious to combine with a dynamic environment graph consisting of nodes and edges so as to apply a weight value to the edges for finding the best route).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways as taught by Johnson so as to allow the dynamic graph to determine the lengths and weight values of pathways to find the optimal route.
	Regarding Claim 2. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the method of claim 1.
	Balutis also teaches:
	wherein the initial global path is determined using a static searching algorithm (the path planning algorithm can be used to find the shortest path from the robot’s current location to the destination, and these adaptive path planning algorithms can include static searching algorithms such as A*, rapidly exploring random trees, or probabilistic roadmaps [paragraph 70]).
	Regarding Claim 3. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the method of claim 2.
	Balutis also teaches:
	wherein the static searching algorithm is selected from the group of searching algorithms consisting of Dijkstra, A*, D*, rapidly-exploring random tree (RRT), particle swarm optimization (PSO), and ant-colony (the path planning algorithm can be used to find the shortest path from the robot’s current location to the destination, and these adaptive path planning algorithms can include A*, rapidly exploring random trees, or probabilistic roadmaps [paragraph 70]).
	Regarding Claim 17. Balutis teaches an adaptive path planning system configured for providing adaptive path planning with respect to a dynamic environment, the adaptive path planning system comprising:
	global guidance logic executable by a processor to determine a planned path through the dynamic environment from a start location to a selected destination for an agent automated vehicle (AV) (A method of operating robotic lawn mowers that operate on different lawns, which read on a dynamic environment. A sensor system, including GPS determines the location of the robot [paragraph 40]. The robot includes a controller that is coupled to a memory storage element that stores traversal routes [paragraph 31]. The controller as shown in FIG. 1C can process and execute at least some of the training data to move about the lawn areas and traversal regions and mow target lawn areas [paragraph 47]. A combination of manual training by a user or automatic path planning by subroutines programmed on the robot determines the routes, and the memory storage element can store data pertaining to the routes. A path planning algorithm can be used to find the shortest path from the robot’s current location to the destination [paragraph 70]), and 
	 local planning logic in communication with the global guidance logic and executable by the processor to control dynamic interaction within the dynamic environment, as the agent AV traverses at least a portion of the planned path (the sensor system of the robot can include a location estimation system [paragraph 38], which can use a time-of-flight between a boundary marker. For example, boundary markers may be placed along the boundary of the lawn. While the time-of-flight system uses markers that have been described as boundary markers, the marker can also be placed within or near the lawn to aid localization of the robot lawnmower. The localization can use triangulation to determine the robot position within the boundary. The signals sent between the boundary markers and the robot positioned on the property allow the robot to estimate the angles and the distance by calculating time of flight to each of the boundary markers, and using trigonometry to calculate the robot's current position [paragraph 38]. The sensor system is connected to a controller shown in FIG. 1C, which can process and execute at least some of the training data to move about the lawn areas and traversal regions and mow target lawn areas [paragraph 47]. The robot can use a path planning algorithm such as A*, rapidly exploring random trees (RRTs), or probabilistic roadmaps to find the shortest path from the robot’s current location to the destination, and can use these algorithms to drive between one are and the other (e.g., determine regions free of obstacles) [paragraph 70], which reads on controlling dynamic interaction with the dynamic environment by local planning logic of an adaptive path planning system).
	Balutis does not teach:
	the controlling dynamic interaction within the dynamic environment is based at least in part on a deep reinforcement learning agent of the local planning logic utilizing localized map sequence information,
	However, Jiang teaches:
	the controlling dynamic interaction within the dynamic environment is based at least in part on a deep reinforcement learning agent of the local planning logic utilizing localized map sequence information (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps. The system updates the DRL network at a frequency of every K time stamps [paragraph 92]. The system stores transition sets into a replay memory R. The transition sets of paragraph 92 are transitions of coordinates, including Xt, at, rt, and Xt+1, where t is a time stamp, t+1 is the next time stamp in the sequence of transition sets, and X is an x coordinate on a map [paragraph 87]. The robotic device samples a number of transition sets from a replay memory to compute a target y coordinate [paragraph 93], which means the robot utilizes localized map sequence information as part of the dynamic reinforcement learning utilized within controlling dynamic interaction within the dynamic environment).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with the controlling dynamic interaction within the dynamic environment is based at least in part on a deep reinforcement learning agent of the local planning logic utilizing localized map sequence information as taught by Jiang so as to allow the robotic system to adapt its path planning according to changes within the dynamic environment.
	Balutis also does not teach:
	wherein the determining the planned path comprises: determining an initial global path connecting the start location with the selected destination; and
	revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV.
	However, Smid teaches:
	wherein the determining the planned path comprises: determining an initial global path connecting the start location with the selected destination via one or more of the pathways; and
	revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV (an autonomous vehicle controller with software for path planning, including adaptive systems that allow the AVC to self-tune based on external parameters and conditions as gathered by sensors of a sensor array [paragraph 27]. The systems self-tune in real time. The environmental sensor array is included in the controller to collect data regarding the vehicle speed, compass heading, absolute position (e.g., from GPS), or relative position [paragraph 18]. The environmental sensor array may also include a central processing unit and a memory/storage [paragraph 9], which means that the senor array stores the information it collects in the memory (forming a history) and uses this information to revise the global path and provide a planned path for traversal of the autonomous vehicle. The AVC includes programming of rules and programming for changing those rules [paragraph 28]. Programming of rules will typically include programming for following instructions provided by a user according to a protocol. Then, upon sensing of an external change of conditions, the AVC will typically include programming to change the protocol. Then, such changed protocol can be stored and used in the future such that the original rules or protocol has been changed. Original rules may map a path of waypoints to be directly followed by vehicle having the AVC and, upon sensing of an obstacle in the direct path between way points, the original rule of following a direct path can be modified to allow the vehicle to follow a path around the obstacle).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the determining the planned path comprises: determining an initial global path connecting the start location with the selected destination; and revising the initial global path based on history information to produce a planned global guidance path for traversal by the agent AV as taught by Smid so as to allow the robot to adjust its path based on prior experiences, such as obstacle collision or issues with the travel region.
	Balutis also does not teach:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment.
	However, Shin teaches:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment (a robot with a global map estimating unit that estimates a global map and combines local maps to extend and update the global map [paragraph 289]. Some embodiments provide a map covering a wide region within a predetermined time under predetermined conditions (parameters) by generating a key frame to a node using scan information, calculating an odometry edge between continuous nodes and updating the key frames to estimate a local map, detecting a loop closure edge between non-continuous nodes relating to a set of updated key frames, and correcting the positions of the nodes based on the odometry edge and the loop closure edge to estimate a global map).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and	path planning is subsequent to initializing the dynamic environment as taught by Shin so as to allow the system to map out the travel paths of each robot and ensure that they comply with input parameters.
	Balutis also does not teach:
	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways.
	However, Johnson teaches:	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways (a method of mapping pathways in a database, wherein a pathway may have a length of L, which may be equal to the number of edges in it [paragraph 57]. This invention is intended to make graphs for technology have a means for weighting pathways in a network among edges, and so it would be obvious to combine with a dynamic environment graph consisting of nodes and edges so as to apply a weight value to the edges for finding the best route).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways as taught by Johnson so as to allow the dynamic graph to determine the lengths and weight values of pathways to find the optimal route.
	Regarding Claim 18. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the adaptive path planning system of claim 17.
	Balutis also teaches:
	wherein the global guidance logic and the local planning logic are executed by a control system implemented internally to the agent AV (the sensor system of the robot can include a location estimation system for local planning [paragraph 38] and a global positioning system for global guidance [paragraph 40]. These systems are controlled by a controller [paragraph 36], and this controller reads on a control system implemented internally to the agent AV).

Claims 5-11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), and Johnson et al. US 20180060396 A1 (“Johnson”) as applied to claim 1 above, and further in view of Erignac US 7606659 B2 (“Erignac”).
	Regarding Claim 5. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the method of claim 1.
	Balutis also teaches:
	wherein the history information comprises path overlay information (a memory storage element that stores data corresponding to points or segments along the lawn routes, traversal routes, and bypass routes [paragraph 31], which reads on path overlay information).
	Balutis does not teach:
	wherein the history information comprises pheromone information, wherein the pheromone information provides for the respective path overlay information decaying over time, and wherein the respective path overlay information decays over time according to a decay rate established based on a level of activity or movement within the dynamic environment.
	However, Erignac teaches:
	wherein the history information comprises pheromone information, wherein the pheromone information provides for the respective path overlay information decaying over time, and wherein the respective path overlay information decays over time according to a decay rate established based on a level of activity or movement within the dynamic environment (an exhaustive swarming search strategy using distributed pheromone maps. Unmanned aerial vehicles (UAVs) can operate in swarms and use of pheromone maps as a common coordination mechanism between agents that emulate insect foraging behavior [Column 1, lines 25-33]. A digital pheromone map overlays a digital grid onto a geographic area. Aspects of the search system may mark searched cells of the digital pheromone map with an identifier of the search agent that searched the cell and the time that the cell was searched. This information can be used to detect areas that are stale and need to be searched again (e.g., for time-dependent searches), or can be used to detect areas that may need to be searched again when a particular agent is determined to be unreliable (e.g., such as when a faulty sensor is detected) [Column 4, lines 35-42]. The fact that this can be used for time-dependent searches indicates that a set decay rate can be established either as a built-in parameter, or as a result of a level of activity or movement within the dynamic environment).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the history information comprises pheromone information, wherein the pheromone information provides for the respective path overlay information decaying over time, and wherein the respective path overlay information decays over time according to a decay rate established based on a level of activity or movement within the dynamic environment as taught by Erignac so that a team of robots working together could detect the paths of cooperating robots and either avoid the other robots to prevent collisions, or keep track of where other robots have been so as to not cover the same region multiple times, or re-check areas where the pheromone has gone stale.
	Regarding Claim 6. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the method of claim 5.
	Balutis also teaches:
	wherein the path overlay information comprises information regarding routes of moving obstacles, temporal information regarding movement, information regarding movement velocity, priority information with respect to moving obstacle movement, obstacle volume, pathway width, or combinations thereof (a controller that includes obstacle detection and avoidance methods and behaviors implemented in response to sensor signals from an obstacle sensing system [paragraph 41]. Specifically, the controller operates the navigation system configured to maneuver the robot in a path or route stored in the memory storage element across the lawn areas and/or traversal regions. The navigation system is a behavior-based system executed on the controller. The navigation system communicates with the sensor system to determine and issue drive commands to the drive system. In particular, the controller includes obstacle detection and avoidance methods and behaviors implemented in response to sensor signals from the obstacle sensing system. The robot can use its proximity sensors to detect the general geometry of an obstacle in the general vicinity in front of the robot so that the robot can determine what direction to turn. The controller can determine when the robot is about to collide with an obstacle and communicate instructions to the navigation system and the drive system to avoid the obstacle. This reads on information regarding temporal information regarding movement. The path behavior data stored in the memory [paragraph 42] can set the drive system to a predetermined drive speed as it makes difficult turns along a specific route, (speed and direction) which reads on information regarding movement velocity).
	Regarding Claim 7. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 5.
	Balutis does not teach:
	wherein the pheromone information comprises information regarding observed recent obstacle movement information.
	However, Erginac teaches: 
	wherein the pheromone information comprises information regarding observed recent obstacle movement information (an avoidance behavior which is selected when the robots (search agents) [Column 3, lines 41-67]. Avoidance behavior can distribute agents in the swarm, thus increasing coverage of the network and reducing crowding of agents. In a particular embodiment, each pair of adjacent agents can select a course perpendicular to their mutual collision course. The agents are repelled by avoiding another agent against the boundary of the digital pheromone map, and FIG. 7C shows how two agents move close to each other but avoid collision as they enter avoidance range of each other on the pheromone map [FIG. 7C, Column 8, lines 49-65]. In effect, the robots treat each other like potential moving obstacles, and the pheromone information includes information regarding observed recent obstacle movement information).
	Regarding Claim 8. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 5.
	Balutis does not teach:
	wherein the pheromone information corresponds to respective instances of the path overlay information.
	However, Erignac teaches:
	wherein the pheromone information corresponds to respective instances of the path overlay information (FIGS. 7A-7C show how the pheromone map keeps track of where the search agents have been, and marks the searched cells with an “S” in the drawings. This reads on respective instances of path overlay information [FIGS. 7A-7C]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the pheromone information corresponds to respective instances of the path overlay information as taught by Erignac so that a team of robots working together could keep track of each robot’s travel paths, and identify which places areas in a region have already been worked on by a robot such as a lawn mower.
	Regarding Claim 9. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 8.
	Balutis does not teach:
	wherein the pheromone information provides for the respective path overlay information decaying over time.
	However, Erignac teaches:
	wherein the pheromone information provides for the respective path overlay information decaying over time (aspects of the search system may mark searched cells of the digital pheromone map with an identifier of the search agent that searched the cell and the time that the cell was searched. This information can be used to detect areas that are stale and need to be searched again (e.g., for time-dependent searches) [Column 4, lines 35-42], which reads on the overlay information decaying over time).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the pheromone information provides for the respective path overlay information decaying over time as taught by Erignac so that a team of robots working together could keep track of each robot’s travel paths, and identify areas that need to be reworked after a period of time.
	Regarding Claim 10. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 9.
	Balutis also teaches:
further comprising: 
	generating a localized map sequence comprising a plurality of localized maps corresponding to positions of the agent AV traversing the planned path, wherein each localized map of the localized map sequence comprises a sub-portion of the dynamic environment centered about a respective position of the agent AV (the sensor system of the robot can include a location estimation system [paragraph 38], which can include a time-of-flight based system that uses a time-of-flight between a boundary marker. For example, boundary markers may be placed along the boundary of the lawn. While the time-of-flight system uses markers that have been described as boundary markers, the marker can also be placed within or near the lawn to aid localization of the robot lawnmower. The localization can use triangulation to determine the robot position within the boundary. The signals sent between the boundary markers and the robot positioned on the property allow the robot to estimate the angles and the distance by calculating time of flight to each of the boundary markers, and using trigonometry to calculate the robot's current position [paragraph 38]).
	Regarding Claim 11. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 10.
	Balutis does not teach:
	wherein the controlling dynamic interaction within the dynamic environment comprises: 	utilizing localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment.
	However, Jiang teaches: 
	wherein the controlling dynamic interaction within the dynamic environment comprises: 	utilizing localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the controlling dynamic interaction within the dynamic environment comprises: utilizing localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment as taught by Jiang so that the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms. 
	Regarding Claim 13. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 11.
	Balutis does not teach:
	wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment.
	However, Jiang teaches:
	wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment (the DRL network facilitates simultaneous localization and mapping (SLAM) and moving object training, allowing the robot to train for interactions with moving objects in a simulated or virtual environment [paragraph 47]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment as taught by Jiang so that the robot can better adapt its behavior to the dynamic environment. 

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), and Erignac US 7606659 B2 (“Erignac”), as applied to claim 11 above, and further in view of Cella et al. US 20190121350 A1 (“Cella”).
	Regarding Claim 12. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 11.
	Balutis does not teach:
	wherein the localized DRL comprises a convolutional neural network (CNN) and recurrent neural network (RNN) configured to provide a modeled representation of the dynamic environment from the localized map sequence.
	However, Jiang teaches:
	wherein the localized DRL comprises a convolutional neural network (CNN) configured to provide a modeled representation of the dynamic environment from the localized map sequence (the system for neural network training uses an artificial neural network (ANN) [paragraph 52], and the ANN may be implemented as a module and used in conjunction with the combined reward functions [paragraph 58]. Example modules specifically include a convolutional neural network (CNN), and other neural networks can also be used. The DRL network can, according to one aspect of the disclosure, track an object and navigate a robot in an environment includes receiving tracking sensor input representing the object and the environment at multiple times [paragraph 17]. A compute reward module, which is part of the DRL, computes rewards by modeling navigation paths to find the most effective path [paragraphs 78-79], so the neural network is configured to provide a modeled representation of the dynamic environment).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the localized DRL comprises a convolutional neural network (CNN) configured to provide a modeled representation of the dynamic environment from the localized map sequence as taught by Jiang so that the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms. 
	Balutis in combination with Jiang does not teach: 
	the localized DRL comprises a recurrent neural network.
	However, Cella teaches:
	the localized DRL comprises a recurrent neural network (a system and method for learning data outcomes that may use a neural set, and a neural set should be understood to encompass a wide range of different types of neural networks, which can include a variety of recurrent neural networks (RNN) [paragraph 442], and in some embodiments, a convolutional neural network (CNN) [paragraph 467]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis in combination with Jiang with the localized DRL comprises a recurrent neural network as taught by Cella because Jiang already teaches that other types of neural networks can be used in their invention, and applying a recurrent neural network would allow the robot to better adapt its behavior.

Claim 14 is rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), and Erignac US 7606659 B2 (“Erignac”), as applied to claim 13 above, and further in view of O’Malia et al. S 20190108448 A1 (“O’Malia”) and Luciw US 20190061147 A1 (“Luciw”).
	Regarding Claim 14. Balutis in combination with Jiang, Smid, Shin, Johnson, and Erginac teaches the method of claim 13.
	Balutis does not teach:
	wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof.
	However, O’Malia teaches:
	wherein the DRL agent comprises Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof (an artificial intelligence framework that can use reinforcement learning algorithms such as PPO (proximal policy optimization), Rainbow, various Deep Q-Learning (DQN) variants [paragraph 16], and A3C [paragraph 68]. O’Malia does not specifically refer to Double deep Q learning).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof as taught by O’Malia so as to allow the system to employ a more advanced form of deep reinforcement learning, such as DDQN.
	Balutis in combination with O’Malia does not teach:
	wherein the DRL agent comprises double deep Q learning (DDQN),
	However, Luciw teaches:
	wherein the DRL agent comprises double deep Q learning (DDQN) (Luciw teaches an improved version of DQN, called Double DQN [paragraph 44]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis in combination with O’Malia with wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof as taught by Luciw so as to allow the system to employ a more advanced form of deep reinforcement learning, such as A3C.

Claims 15 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), and Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), Erignac US 7606659 B2 (“Erignac”), O’Malia et al. S 20190108448 A1 (“O’Malia”) and Luciw US 20190061147 A1 (“Luciw”) as applied to claim 14 above, and further in view of Zimmerman US 20080077511 A1 (“Zimmerman”).
	Regarding Claim 15. Balutis in combination with Jiang, Smid, Shin, Johnson, Erginac, O’Malia, and Luciw teaches the method of claim 14.
	Balutis does not teach:
	wherein the dynamic environment comprises a multivehicle environment in which a plurality of moving obstacles are being operated.
	However, Zimmerman teaches:
	wherein the dynamic environment comprises a multivehicle environment in which a plurality of moving obstacles are being operated (a mobile inventory robot that can operate in an environment where multiple moving objects, such as people and shopping carts, are moving about the environment at once [paragraph 59]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the dynamic environment comprises a multivehicle environment in which a plurality of moving obstacles are being operated as taught by Zimmerman so that the method can be applied to a multivehicle environment with multiple moving obstacles.
	Regarding Claim 16. Balutis in combination with Jiang, Smid, Shin, Johnson, Erginac, O’Malia, and Luciw teaches the method of claim 15.
	Balutis does not teach:
	wherein the multivehicle environment is an environment selected from the group consisting of a warehouse, a factory, and a city street grid.
	However, Smid teaches:
	wherein the multivehicle environment is an environment selected from the group consisting of a warehouse, a factory, and a city street grid (An autonomous vehicle controller that is intended to work with a variety of vehicles, including forklifts, tractors, golf ball collection vehicles, warehouse vehicles, and the like [paragraph 4]. Forklifts are often employed in factories, and so this reference reads on the claim language for a warehouse and a factory. In another embodiment, the path planning may identify portions of a digitized geospatial representation that is likely to indicate a road or street suitable for the vehicle to travel on [paragraph 42]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the multivehicle environment is an environment selected from the group consisting of a warehouse, a factory, and a city street grid as taught by Smid so that the method can be applied to robots working in a warehouse, factory, street, or similar environment.

Claims 19-21 are rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), as applied to claim 18 above, and further in view of Goldman-Shenhar et al. US 20130219294 A1 (“Goldman-Shenhar”).
	Regarding Claim 19. Balutis in combination with Jiang, Smid, Shin, and Johnson teaches the adaptive path planning system of claim 18.
	Balutis does not teach:
	wherein the control system comprises a control system selected from the group consisting of a vehicle control unit (VCU), an electronic control unit (ECU), and an on-board computer (OBC).
	However, Goldman-Shenhar teaches:
	wherein the control system comprises a control system selected from the group consisting of a vehicle control unit (VCU), an electronic control unit (ECU), and an on-board computer (OBC) (a vehicle with a system that can be a sub-system of the vehicle, and the computerized aspects can be a primary vehicle unit, such as a vehicle electronic control unit [paragraph 30]. The primary computing unit of the vehicle can be an on-board computer [paragraph 42]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the control system comprises a control system selected from the group consisting of a vehicle control unit (VCU), an electronic control unit (ECU), and an on-board computer (OBC) as taught by Goldman-Shenhar so that the robot controller can be integrated within the robot itself. 
	Regarding Claim 20. Balutis in combination with Jiang, Smid, Shin, Johnson, and Goldman-Shenhar teaches the adaptive path planning system of claim 19.
	Balutis also teaches:
	wherein the global guidance logic comprises a static searching algorithm utilized to determine the initial global path connecting a start location with the selected destination (a path planning algorithm can be used to find the shortest path from the robot’s current location to the destination, and these adaptive path planning algorithms can include A*, rapidly exploring random trees, or probabilistic roadmaps [paragraph 70]).
	Regarding Claim 21. Balutis in combination with Jiang, Smid, Shin, Johnson, and Goldman-Shenhar teaches the adaptive path planning system of claim 20.
	Balutis also teaches:
	further comprising: 
	a database storing history information for traversal by the agent AV (the controller is coupled to a memory storage element that stores the behavior of the robot during various operations and the schedules for mowing discontiguous lawn areas [paragraph 31]).
	Balutis does not teach:
	The history information is used to revise the initial global path and provide the planned global path providing the planned path for traversal by the agent AV.
	However, Smid teaches:
	a database storing history information to revise the initial global path and provide the planned global path providing the planned path for traversal by the agent AV (an autonomous vehicle controller that includes software for path planning, which may include adaptive systems that allow the AVC to self-tune based on external parameters and conditions as gathered by sensors of a sensor array [paragraph 27]. Preferably, the systems include the ability self-tune in real time. The environmental sensor array is included in the controller to collect data regarding the vehicle speed, compass heading, absolute position (e.g., from GPS), or relative position [paragraph 18]. The environmental sensor array may also include a central processing unit and a memory/storage [paragraph 9], which means that the senor array stores the information it collects in the memory (forming a history) and uses this information to revise the global path and provide a planned path for traversal of the autonomous vehicle).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with a database storing history information to revise the initial global path and provide the planned global path providing the planned path for traversal by the agent AV as taught by Smid so as to allow the robot to revise its initial global path based on the information stored within the memory. 

Claims 22-26 and 28 are rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), and Goldman-Shenhar et al. US 20130219294 A1 (“Goldman-Shenhar”) as applied to claim 21 above, and further in view of Erignac US 7606659 B2 (“Erignac”).
	Regarding Claim 22. Balutis in combination with Jiang, Smid, Shin, Johnson, and Goldman-Shenhar teaches the adaptive path planning system of claim 21.
	Balutis also teaches:
	wherein the history information comprises path overlay information (a memory storage element that stores data corresponding to points or segments along the lawn routes, traversal routes, and bypass routes [paragraph 31], which reads on path overlay information).
	Balutis does not teach:
	wherein the history information comprises pheromone information.
	However, Erignac teaches:
	wherein the history information comprises pheromone information (an exhaustive swarming search strategy using distributed pheromone maps. Unmanned aerial vehicles (UAVs) can operate in swarms, and use pheromone maps as a common coordination mechanism between agents that emulate insect foraging behavior [Column 1, lines 25-33]. A digital pheromone map overlays a digital grid onto a geographic area).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the history information comprises pheromone information as taught by Erignac so that a team of robots working together could detect the paths of cooperating robots and either avoid the other robots to prevent collisions, or keep track of where other robots have been so as to not cover the same region multiple times, such as a team of lawn mowers traveling over a lawn to cut grass.
	Regarding Claim 23. Balutis in combination with Jiang, Smid, Shin, Johnson, and Goldman-Shenhar teaches the adaptive path planning system of claim 22.
	Balutis also teaches:
	wherein the path overlay information comprises information regarding routes of moving obstacles, temporal information regarding movement, information regarding movement velocity, priority information with respect to moving obstacle movement, obstacle volume, pathway width, or combinations thereof (a controller that includes obstacle detection and avoidance methods and behaviors implemented in response to sensor signals from an obstacle sensing system [paragraph 41]. Specifically, the controller operates the navigation system configured to maneuver the robot in a path or route stored in the memory storage element across the lawn areas and/or traversal regions. The navigation system is a behavior-based system executed on the controller. The navigation system communicates with the sensor system to determine and issue drive commands to the drive system. In particular, the controller includes obstacle detection and avoidance methods and behaviors implemented in response to sensor signals from the obstacle sensing system. The robot can use its proximity sensors to detect the general geometry of an obstacle in the general vicinity in front of the robot so that the robot can determine what direction to turn. The controller can determine when the robot is about to collide with an obstacle and communicate instructions to the navigation system and the drive system to avoid the obstacle. This reads on information regarding temporal information regarding movement. The path behavior data stored in the memory [paragraph 42] can set the drive system to a predetermined drive speed as it makes difficult turns along a specific route, (speed and direction) which reads on information regarding movement velocity).
	Regarding Claim 24. Balutis in combination with Jiang, Smid, Shin, Johnson, and Goldman-Shenhar teaches the adaptive path planning system of claim 22.
	Balutis does not teach:
	wherein the pheromone information corresponds to respective instances of the path overlay information,
	and wherein the pheromone information provides for the respective path overlay information decaying over time.
	However, Erignac teaches:
	wherein the pheromone information corresponds to respective instances of the path overlay information (FIGS. 7A-7C shows how the pheromone map keeps track of where the search agents have been, and marks the searched cells with an “S” in the drawings. This reads on respective instances of path overlay information [FIGS. 7A-7C]), 
	and wherein the pheromone information provides for the respective path overlay information decaying over time (aspects of the search system may mark searched cells of the digital pheromone map with an identifier of the search agent that searched the cell and the time that the cell was searched. This information can be used to detect areas that are stale and need to be searched again (e.g., for time-dependent searches) [Column 4, lines 35-42], which reads on the overlay information decaying over time).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the pheromone information corresponds to respective instances of the path overlay information, and wherein the pheromone information provides for the respective path overlay information decaying over time as taught by Erignac so as to allow the robots to avoid colliding with other robot pathways, and recognize when robot pathways are clear due to time. 
	Regarding Claim 25. Balutis in combination with Jiang, Smid, Shin, Johnson, Goldman-Shenhar, and Erignac teaches the adaptive path planning system of claim 24.
	Balutis also teaches:
further comprising: 
	localized map generation logic configured to generate a localized map sequence comprising a plurality of localized maps corresponding to positions of the agent AV traversing the planned path, wherein each localized map of the localized map sequence comprises a sub-portion of the dynamic environment centered about a respective position of the agent AV (the sensor system of the robot can include a location estimation system [paragraph 38], which can include a time-of-flight based system that uses a time-of-flight between a boundary marker. For example, boundary markers may be placed along the boundary of the lawn. While the time-of-flight system uses markers that have been described as boundary markers, the marker can also be placed within or near the lawn to aid localization of the robot lawnmower. The localization can use triangulation to determine the robot position within the boundary. The signals sent between the boundary markers and the robot positioned on the property allow the robot to estimate the angles and the distance by calculating time of flight to each of the boundary markers, and using trigonometry to calculate the robot's current position [paragraph 38]).
	Regarding Claim 26. Balutis in combination with Jiang, Smid, Goldman-Shenhar, Erignac, and Shin teaches the adaptive path planning system of claim 25.
	Balutis does not teach:
	wherein the local planning logic is configured to utilize localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment.
	However, Jiang teaches:
	wherein the local planning logic is configured to utilize localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the local planning logic is configured to utilize localized deep reinforcement learning (DRL) with respect to localized maps of the localized map sequence to determine actions for the agent AV within the dynamic environment as taught by Jiang so that the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms. 
	Regarding Claim 28. Balutis in combination with Jiang, Smid, Goldman-Shenhar, Erignac, Shin, and Jiang teaches the adaptive path planning system of claim 26.
	Balutis does not teach:
	wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment.
	However, Jiang teaches:
	wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment (the DRL network facilitates simultaneous localization and mapping (SLAM) and moving object training, allowing the robot to train for interactions with moving objects in a simulated or virtual environment [paragraph 47]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the localized DRL comprises a DRL agent configured as a localized planner for directing interactions of the AV in the dynamic environment as taught by Jiang so that the robot can better adapt its behavior to the dynamic environment. 

Claim 27 is rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), Goldman-Shenhar et al. US 20130219294 A1 (“Goldman-Shenhar”), and Erignac US 7606659 B2 (“Erignac”) as applied to claim 26 above, and further in view of Cella et al. US 20190121350 A1 (“Cella”).
	Regarding Claim 27. Balutis in combination with Jiang, Goldman-Shenhar, Smid, Shin, Johnson, Erignac, and Jiang teaches the adaptive path planning system of claim 26.
	Balutis does not teach:
	wherein the localized DRL is configured to utilize a convolutional neural network (CNN) and recurrent neural network (RNN) to provide a modeled representation of the dynamic environment from the localized map sequence.
	However, Jiang teaches:
	wherein the localized DRL is configured to utilize a convolutional neural network (CNN) to provide a modeled representation of the dynamic environment from the localized map sequence (the system for neural network training uses an artificial neural network (ANN) [paragraph 52], and the ANN may be implemented as a module and used in conjunction with the combined reward functions [paragraph 58]. Example modules specifically include a convolutional neural network (CNN), and other neural networks can also be used. The DRL network can, according to one aspect of the disclosure, track an object and navigate a robot in an environment includes receiving tracking sensor input representing the object and the environment at multiple times [paragraph 17]. A compute reward module, which is part of the DRL, computes rewards by modeling navigation paths to find the most effective path [paragraphs 78-79], so the neural network is configured to provide a modeled representation of the dynamic environment).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the localized DRL is configured to utilize a convolutional neural network (CNN) to provide a modeled representation of the dynamic environment from the localized map sequence as taught by Jiang so that the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms. 
	Balutis in combination with Jiang does not teach: 
	the localized DRL comprises a recurrent neural network.
	However, Cella teaches:
	the localized DRL comprises a recurrent neural network (a system and method for learning data outcomes that may use a neural set, and a neural set should be understood to encompass a wide range of different types of neural networks, which can include a variety of recurrent neural networks (RNN) [paragraph 442], and in some embodiments, a convolutional neural network (CNN) [paragraph 467]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis in combination with Jiang with the localized DRL comprises a recurrent neural network as taught by Cella because Jiang already teaches that other types of neural networks can be used in their invention, and applying a recurrent neural network would allow the robot to better adapt its behavior.

Claim 29 is rejected under 35 U.S.C. 103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), Johnson et al. US 20180060396 A1 (“Johnson”), Goldman-Shenhar et al. US 20130219294 A1 (“Goldman-Shenhar”), and Erignac US 7606659 B2 (“Erignac”), as applied to claim 28 above, and further in view of O’Malia et al. US 20190108448 A1 (“O’Malia’) and Luciw US 20190061147 A1 (“Luciw”).
	Regarding Claim 29. Balutis in combination with Jiang, Smid, Shin, Johnson, Goldman-Shenhar, Erignac, and Shin teaches the adaptive path planning system of claim 28.
	Balutis does not teach:
	wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof.
	However, O’Malia teaches:
	wherein the DRL agent comprises Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof (O’Malia teaches an artificial intelligence framework that can use reinforcement learning algorithms such as PPO (proximal policy optimization), Rainbow, various Deep Q-Learning (DQN) variants [paragraph 16], and A3C [paragraph 68]. O’Malia does not specifically refer to Double deep Q learning).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof as taught by O’Malia so as to allow the system to employ a more advanced form of deep reinforcement learning, such as DDQN.
	Balutis in combination with O’Malia does not teach:
	wherein the DRL agent comprises double deep Q learning (DDQN),
	However, Luciw teaches:
	wherein the DRL agent comprises double deep Q learning (DDQN) (Luciw teaches an improved version of DQN, called Double DQN [paragraph 44]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis in combination with O’Malia with wherein the DRL agent comprises double deep Q learning (DDQN), Rainbow, Proximal Policy Optimization (PPO), Asynchronous Advantage Actor Critic (A3C), or a combination thereof as taught by Luciw so as to allow the system to employ a more advanced form of deep reinforcement learning, such as A3C. 

Claims 30-32 are rejected under 35 U.S.C.103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Zimmerman US 20080077511 A1 (“Zimmerman”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), and Johnson et al. US 20180060396 A1 (“Johnson”).
	Regarding Claim 30. Balutis teaches a method for adaptive path planning, the method comprising:
	defining, by a processor, a start and a destination of a path for an agent vehicle through the multivehicle environment;
	planning, by the processor using a static searching algorithm, an initial path within the environment connecting the start and the destination via the pathways (a path planning algorithm can be used to find the shortest path from the robot’s current location to the destination, and these adaptive path planning algorithms can include A*, rapidly exploring random trees, or probabilistic roadmaps [paragraph 70]);
	Balutis does not teach:
	generating, by the processor a localized map sequence comprising a plurality of localized maps corresponding to positions of the agent vehicle traversing at least a portion of the planned global guidance path; and 
	providing, by deep reinforcement learning agent using one or more localized map of the localized map sequence, a direction of a next movement of the agent vehicle;
	However, Jiang teaches:
	generating, by the processor, a localized map sequence comprising a plurality of localized maps corresponding to positions of the agent vehicle traversing at least a portion of the planned global guidance path (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps. The system updates the DRL network at a frequency of every K time stamps [paragraph 92]. The system stores transition sets into a replay memory R. The transition sets of paragraph 92 are transitions of coordinates, including Xt, at, rt, and Xt+1, where t is a time stamp, t+1 is the next time stamp in the sequence of transition sets, and X is an x coordinate on a map [paragraph 87]. The robotic device samples a number of transition sets from a replay memory to compute a target y coordinate [paragraph 93], which means the robot utilizes localized map sequence information as part of the dynamic reinforcement learning utilized within controlling dynamic interaction within the dynamic environment); and 
	providing, by deep reinforcement learning agent using one or more localized map of the localized map sequence, a direction of a next movement of the agent vehicle (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network, which reads on providing a direction of the next movement of the agent vehicle [paragraph 23]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with generating, by the processor a localized map sequence comprising a plurality of localized maps corresponding to positions of the agent vehicle traversing at least a portion of the planned global guidance path; and providing, by deep reinforcement learning agent using one or more localized map of the localized map sequence, a direction of a next movement of the agent vehicle as taught by Jiang so as to allow the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms, and to adapt its path planning according to changes within the dynamic environment.
	Balutis also does not teach:
	The environment is a multivehicle environment 
	However, Zimmerman teaches:
	The environment is a multivehicle environment (a mobile inventory robot that can operate in an environment where multiple moving objects, such as people and shopping carts, are moving about the environment at once [paragraph 59]).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with the environment is a multivehicle environment as taught by Zimmerman so as to allow the invention to work with a team of vehicles, or to simply work in an environment where other vehicles are present.
	Balutis also does not teach:
	revising, by the processor, from the initial path based on history information to produce a planned global guidance path corresponding to a planned path for the agent vehicle to travel from the start to the destination; and
	analyzing the movement of the agent vehicle and providing feedback to the deep reinforcement learning agent.
	However, Smid teaches:
	revising, by the processor, from the initial path based on history information to produce a planned global guidance path corresponding to a planned path for the agent vehicle to travel from the start to the destination (an autonomous vehicle controller that includes software for path planning, which may include adaptive systems that allow the AVC to self-tune based on external parameters and conditions as gathered by sensors of a sensor array [paragraph 27]. Preferably, the systems include the ability to self-tune in real time. The environmental sensor array is included in the controller to collect data regarding the vehicle speed, compass heading, absolute position (e.g., from GPS), or relative position [paragraph 18]. The environmental sensor array may also include a central processing unit and a memory/storage [paragraph 9], which means that the senor array stores the information it collects in the memory (forming a history) and uses this information to revise the global path and provide a planned path for traversal of the autonomous vehicle. The AVC includes an intelligent design such that the AVC includes programming of rules and programming for changing those rules [paragraph 28]. Programming of rules will typically include programming for following instructions provided by a user according to a protocol. Then, upon sensing of an external change of conditions, the AVC will typically include programming to change the protocol. Then, such changed protocol can be stored and used in the future such that the original rules or protocol has been changed. As an example, original rules may map a path of waypoints to be directly followed by vehicle having the AVC and, upon sensing of an obstacle in the direct path between way points, the original rule of following a direct path can be modified to allow the vehicle to follow a path around the obstacle. In another example, a rule can be used to change a parameter value of a drive control rule to adapt to terrain variations. This means that Smid teaches a means for making an initial path and revising the initial path based on information stored and used in the future (history information) to produce a planned global guidance path); and
	analyzing the movement of the agent vehicle and providing feedback to the deep reinforcement learning agent (a learning technique may be applied by obtaining information on each operation when the moving object moves up a slope or climbs up a threshold or onto a carpet to distinguish whether the moving robot is moving on a slope or threshold [paragraph 273]. Smid does not teach that the learning technique is deep reinforcement learning, but it does not teach away from using DRL as the learning technique).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with revising, by the processor, from the initial path based on history information to produce a planned global guidance path corresponding to a planned path for the agent vehicle to travel from the start to the destination; and analyzing the movement of the agent vehicle and providing feedback to the deep reinforcement learning agent as taught by Smid so as to allow the robot to adjust its path based on prior experiences, such as obstacle collision or issues with the travel region, and so that the robot can better adapt its behavior, similar to how Balutis uses adaptive planning algorithms. 
	Balutis also does not teach:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment.
	However, Shin teaches:
	initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and
	path planning is subsequent to initializing the dynamic environment (a robot with a global map estimating unit that estimates a global map and combines local maps to extend and update the global map [paragraph 289]. Some embodiments provide a map covering a wide region within a predetermined time under predetermined conditions (parameters) by generating a key frame to a node using scan information, calculating an odometry edge between continuous nodes and updating the key frames to estimate a local map, detecting a loop closure edge between non-continuous nodes relating to a set of updated key frames, and correcting the positions of the nodes based on the odometry edge and the loop closure edge to estimate a global map).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with initializing the dynamic environment based on environment parameters, the environment parameters comprising a graph having nodes and edges connecting different pairs of the nodes, wherein the edges correspond to pathways within the dynamic environment and the nodes define interactions between the pathways defined by the edges; and	path planning is subsequent to initializing the dynamic environment as taught by Shin so as to allow the system to map out the travel paths of each robot and ensure that they comply with input parameters.
	Balutis also does not teach:
	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways.
	However, Johnson teaches:	wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways (a method of mapping pathways in a database, wherein a pathway may have a length of L, which may be equal to the number of edges in it [paragraph 57]. This invention is intended to make graphs for technology have a means for weighting pathways in a network among edges, and so it would be obvious to combine with a dynamic environment graph consisting of nodes and edges so as to apply a weight value to the edges for finding the best route).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein each of the edges is associated with a weight representing a length of a corresponding pathway of the pathways as taught by Johnson so as to allow the dynamic graph to determine the lengths and weight values of pathways to find the optimal route.
	Regarding Claim 31. Balutis in combination with Zimmerman, Smid, Shin, Johnson and Jiang teaches the method of claim 30.
	Balutis does not teach:
further comprising:
	Generating a new localized map for the localized map sequence for a position of the agent vehicle after movement by the agent vehicle based upon the direction provided by the deep reinforcement learning agent.
	However, Jiang teaches:
further comprising:
	Generating a new localized map for the localized map sequence for a position of the agent vehicle after movement by the agent vehicle based upon the direction provided by the deep reinforcement learning agent (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) [paragraph 47]. SLAM technology enables a robot to navigate through a known or unknown environment by allowing the robot to build and maintain a 2D or 3D map of a known or unknown environment and localize itself in the built environment map [paragraph 5]).
	Regarding Claim 32. Balutis in combination with Zimmerman, Smid, Shin, Johnson and Jiang teaches the method of claim 31.
	Balutis does not teach:
further comprising:
	Updating the localized map sequence after the movement of the agent vehicle by inserting the new localized map and deleting an oldest localized map of the localized map sequence.
	However, Jiang teaches:
further comprising:
	Updating the localized map sequence after the movement of the agent vehicle by inserting the new localized map and deleting an oldest localized map of the localized map sequence (a system for tracking an object and navigating an object tracking robot that involves calculating positions of the robot and the object in an environment at multiple times, and the use of a deep reinforcement learning (DRL) network that has been trained as a function of an object tracking quality reward and a robot navigation path quality reward to calculate positions of the robot and the object at the multiple times and determine an action specifying movement of the object tracking robot via the DRL network [paragraph 23]. The DRL network facilitates simultaneous localization and mapping (SLAM) and moving object tracking [paragraph 47]. SLAM allows the robot to build and maintain a 2D/3D map of a known or unknown environment and at the same time, localizes (determines its location) itself in the built environment map [paragraph 5], which reads on utilizing localized DRL with respect to localized maps. Jiang also teaches that the system updates the DRL network at a frequency of every K time stamps [paragraph 92]. The system stores transition sets into a replay memory R. The transition sets of paragraph 92 are transitions of coordinates, including Xt, at, rt, and Xt+1, where t is a time stamp, t+1 is the next time stamp in the sequence of transition sets, and X is an x coordinate on a map [paragraph 87]. The robotic device samples a number of transition sets from a replay memory to compute a target y coordinate [paragraph 93], which means the robot utilizes localized map sequence information as part of the dynamic reinforcement learning utilized within controlling dynamic interaction within the dynamic environment. Jiang also teaches that the system updates the DRL network at a frequency of every K time stamps [paragraph 92]. The system stores transition sets into a replay memory R. The replay memory normally has a storage limit, and once the storage limit has been reached, the oldest transition set will be replaced at operation by the newly added transition set. This means that the oldest localized map data is deleted and replaced with the newest map in the localized map sequence).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with further comprising: updating the localized map sequence after the movement of the agent vehicle by inserting the new localized map and deleting an oldest localized map of the localized map sequence as taught by Jiang so that the memory of the system can remove outdated maps to free up memory space for new map data.

Claim 33 is rejected under 35 U.S.C.103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Zimmerman US 20080077511 A1 (“Zimmerman”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), and Johnson et al. US 20180060396 A1 (“Johnson”) as applied to claim 31 above, and in further view of CHIBA US 20200042018 A1 (“CHIBA”).
	Regarding Claim 33. Balutis in combination with Jiang, Zimmerman, Smid, Shin, and Johnson teaches the method of claim 31.
	Balutis does not teach:
	determining if the agent vehicle has arrived at the destination, wherein if it is determined that the agent vehicle has not arrived at the destination updating the localized map sequence with the new localized map and repeating the providing the direction of a next movement of the agent vehicle and the analyzing the movement of the agent vehicle.
	However, CHIBA teaches:
	determining if the agent vehicle has arrived at the destination, wherein if it is determined that the agent vehicle has not arrived at the destination updating the localized map sequence with the new localized map and repeating the providing the direction of a next movement of the agent vehicle and the analyzing the movement of the agent vehicle (a multi-agent robotics management system which involves mapping and localization by the robots shown in FIG. 9. In FIG. 6, the method of vehicle movement and object avoidance is shown, where the robot determines at step 624 whether the robot has arrived at its target destination. If it has not, the apparatus loops back to step 616, where the robot determines whether the next virtual station is farther than 10 meters away from the vehicle and whether there is an obstacle within the next 5 meters on the vehicle’s path to the next virtual station [paragraph 77]. This reads on determining if the agent vehicle has arrived at the destination, and updating a localized map if it is determined that the vehicle has not arrived at the location).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with determining if the agent vehicle has arrived at the destination, wherein if it is determined that the agent vehicle has not arrived at the destination updating the localized map sequence with the new localized map and repeating the providing the direction of a next movement of the agent vehicle and the analyzing the movement of the agent vehicle as taught by CHIBA so that the robot can determine when it has reached its destination and if it should continue updating its localized map sequence.

Claims 34 are rejected under 35 U.S.C.103 as being unpatentable over Balutis et al. US 20160174459 A1 (“Balutis”), Jiang et al. US 20190217476 A1 (“Jiang”), Zimmerman US 20080077511 A1 (“Zimmerman”), Smid et al. US 20080262669 A1 (“Smid”), Smid et al. US 20080262669 A1 (“Smid”), Shin et al. US 20180149753 A1 (“Shin”), and CHIBA US 20200042018 A1 (“CHIBA”) as applied to claim 33 above, and in further view of Erignac US 7606659 B2 (“Erignac”).
	Regarding Claim 34. Balutis in combination with Jiang, Zimmerman, Smid, and CHIBA teaches the method of claim 33.
	Balutis also teaches:
	wherein the history information comprises path overlay information (a memory storage element that stores data corresponding to points or segments along the lawn routes, traversal routes, and bypass routes [paragraph 31], which reads on path overlay information).
	Balutis does not teach:
	wherein the history information comprises pheromone information.
	However, Erignac teaches:
	wherein the history information comprises pheromone information (an exhaustive swarming search strategy using distributed pheromone maps. Unmanned aerial vehicles (UAVs) can operate in swarms, and use of pheromone maps as a common coordination mechanism between agents that emulate insect foraging behavior [Column 1, lines 25-33]. A digital pheromone map overlays a digital grid onto a geographic area).
	It would have been obvious to one of ordinary skill in the art at the time to modify the invention of Balutis with wherein the history information comprises pheromone information as taught by Erignac so that a team of robots working together could detect the paths of cooperating robots and either avoid the other robots to prevent collisions, or keep track of where other robots have been so as to not cover the same region multiple times, such as a team of lawn mowers traveling over a lawn to cut grass.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AARON G CAIN whose telephone number is (571)272-7009. The examiner can normally be reached Monday: 7:30am - 4:30pm EST to Friday 7:30pm - 4:30am.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Khoi Tran can be reached on (517)272-6919. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/A.G.C./Examiner, Art Unit 3664                                                                                                                                                                                                        /KHOI H TRAN/Supervisory Patent Examiner, Art Unit 3664