DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Response to Amendment

This Office Action is responsive to Applicant’s remarks received on October 05, 2021.  Claims 1-20 are pending.


Claim Rejections - 35 USC § 101

The previous 101 rejections have been withdrawn in light of Applicant’s amendment.


Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4-10, and 12-20 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gutierrez et al. (US 2020/0023842), Herman (US 2019/0124232) and Adeli-Mosabbeb et al. (US 2021/0103742).
Regarding claim 1, Gutierrez et al. discloses a method, comprising: 
obtaining, by a computer system of a vehicle equipped with one or more sensors, a sequence of video frames corresponding to a scene external to the vehicle, wherein the sequence of video frames is captured using at least one of the one or more sensors (“As discussed above, the autonomous driving stack of a vehicle may utilize a variety of sensor data (e.g., 1358) generated by various sensors provided on and external to the vehicle. As an example, a vehicle 1205 may possess an array of sensors 1325 to collect various information relating to the exterior of the vehicle and the surrounding environment, vehicle system status, conditions within the vehicle, and other information usable by the modules of the vehicle's processing system 1310. For instance, such sensors 1325 may include global positioning (GPS) sensors 1368, light detection and ranging (LIDAR) sensors 1370, two-dimensional (2D) cameras 1372, three-dimensional (3D) or stereo cameras 1374” at paragraph 0103, line 1); 
detecting, by the computer system, one or more vulnerable road users in the sequence of video frames, wherein the detecting comprises estimating pose of each of 
generating, by the computer system, a segmentation map of the scene using one or more of the video frames (“Various embodiments may include one or more of the following features: a neural network to detect, track and classify road users, a segmentation module to detect the traversable space for road users as well as their designated areas” at paragraph 0026, line 1); 
estimating, by the computer system, one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, 
adjusting, by the computer system, one or more automated driving actions based on the estimated one or more intention probabilities (“If the one or more selected trajectories for a road user are in conflict with a trajectory of a vehicle, these trajectories may be communicated to the I2V communication module 314 to warn the vehicle based on the predicted trajectory information, such that the trajectory of the vehicle may be altered to reduce the risk of collision. For example, the velocity and/or direction of the vehicle may be changed responsive to the warning” at paragraph 0082). 
Gutierrez et al. does not explicitly disclose a full resolution segmentation map.
Herman teaches a method in the same field of endeavor of vehicle image processing, comprising:
generating, by the computer system, a full resolution segmentation map of the scene using one or more of the video frames (“An example of a convolutional neural network that can segment color video data of traffic scenes into is a full-resolution residual network (FRRN). FRRNs maintain a full resolution version of the input video data in parallel to a pooling/unpooling data stream in a convolutional neural network that includes feedback loops, that communicates with the full-resolution data stream to determine a limited number of segments in full-resolution video data. Configurations of convolutional neural networks can determine segments in input video data within natural region boundaries in color video data. Natural region boundaries include color 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a network as taught by Herman to generate the segmentation map for the system of Gutierrez et al. to “improve the performance of the convolutional neural networks over using non-color-corrected color video images for image understanding tasks related to vehicle 110 navigation” (Herman at paragraph 0045, last sentence).

The Gutierrez et al. and Herman combination does not explicitly disclose estimating one or more intention probabilities based on physical interactions and causality between current behaviors and different elements of the scene.
Adeli-Mosabbeb et al. teaches a method in the same field of endeavor of pedestrian intent detection, comprising:
estimating, by the computer system, one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs based on physical interactions and causality between current behaviors and different elements of the scene (“To model pedestrian actions and interactions with others through time, we connect pedestrian and context nodes between consecutive frames to further reasoning about the temporal relations” at paragraph 0018, line 19; “For example, an object to 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the intention modeling as taught by Adeli-Mosabbeb et al. in the system of the Gutierrez et al. and Herman combination “facilitates realistic visual reasoning to infer the intents, even in complex scenes” (Adeli-Mosabbeb et al. at paragraph 0023, last sentence).
Regarding claim 2, Gutierrez et al. discloses a method further comprising: 
estimating, by the computer system, one or more behavior states based at least on the estimated pose of the VRUs, each behavior state corresponding to one of the detected VRUs (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4); and 
estimating, by the computer system, future trajectories of the one or more VRUs using the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify 
Regarding claim 4, Gutierrez et al. disclose a method wherein estimating the one or more intention probabilities comprises: 
estimating the one or more behavior states based at least on the estimated pose of the one or more VRUs using a multi-task model (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4; “multitasking and parallel processing may be advantageous” at paragraph 0163, line 10); and 
estimating the one or more intention probabilities based on the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to 
Regarding claim 5, Gutierrez et al. disclose a method wherein estimating the one or more behavior states comprises: 
applying a neural network architecture to a continuous sequence of poses of each of the one or more VRUs to select a behavior state for the VRU among a plurality of predefined behavior states (“In various embodiments, for scenarios that remain relatively constant, a neural network may be trained to generate the trajectories, which would further accelerate the trajectory planning. A neural network approach may provide for high accuracy, efficiency, and speed of generating trajectories within a generative model” at paragraph 0061, line 1).
Regarding claim 6, Gutierrez et al. disclose a method wherein generating the segmentation map comprises: 
applying a neural network architecture to one or more of the video frames to classify each pixel in the video frames as one of a plurality of predefined classes, each class corresponding to a segment in the segmentation map (“a neural network to detect, track and classify road users” at paragraph 0026, line 2).
Regarding claim 7, Gutierrez et al. disclose a method further comprising: 
selecting, by the computer system, at least one high-risk VRU from among the one or more VRUs based on the predicted behavior state and proximity of each VRU to the vehicle (“In some embodiments, a computing system of the vehicle 204 or a 
Regarding claim 8, the Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination discloses a method further comprising: 
estimating, by the computer system, the one or more intention probabilities by analyzing social interactions between the one or more VRUs and one or more classes corresponding to objects in the segmentation map (“To model pedestrian actions and interactions with others through time, we connect pedestrian and context nodes between consecutive frames to further reasoning about the temporal relations” Adeli-Mosabbeb et al. at paragraph 0018, line 19; “For example, an object to which the pedestrian is giving high attention should be associated with a heavier edge weight. As discussed above, a separate graph is generated for each pedestrian identified during the parsing stage” Adeli-Mosabbeb et al. at paragraph 0033, last sentence; “pedestrian interactions with other objects in the scene, such as other pedestrians, vehicles, traffic signs, lights, and environmental factors like zebra-crossings. We argue that such factors can be uncovered through reasoning over the relationship across different objects in the scene. Therefore, we explore graph-based spatiotemporal modeling of the relationships in a recurrent framework” Adeli-Mosabbeb et al. at paragraph 0017, line 6).
Regarding claim 9, Gutierrez et al. discloses a computer system comprising: 
at least one processor (“As an example, a module, engine, block, unit, model, system, or logic may include one or more hardware components, such as a micro-controller or processor” at paragraph 0091, line 9); 

obtain, a sequence of video frames corresponding to a scene external to a vehicle captured by one or more sensors (“As discussed above, the autonomous driving stack of a vehicle may utilize a variety of sensor data (e.g., 1358) generated by various sensors provided on and external to the vehicle. As an example, a vehicle 1205 may possess an array of sensors 1325 to collect various information relating to the exterior of the vehicle and the surrounding environment, vehicle system status, conditions within the vehicle, and other information usable by the modules of the vehicle's processing system 1310. For instance, such sensors 1325 may include global positioning (GPS) sensors 1368, light detection and ranging (LIDAR) sensors 1370, two-dimensional (2D) cameras 1372, three-dimensional (3D) or stereo cameras 1374” at paragraph 0103, line 1); 
detect one or more vulnerable road users in the sequence of video frames, wherein the detecting comprises estimating pose of each of the detected one or more VRUs (“Any suitable processes for road user detection and tracking may be implemented by circuitry of module 308, such as an open source or propriety framework that provides road user detection and tracking of a scene based on the sensor data 316. For example, Spencer people tracking (e.g., as provided at www.spencer.eu), OpenPose (e.g., as described in the paper "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" by Cao et al.), or other suitable tracking processes may be utilized by road user detection and tracking module 308. In various 
generate a segmentation map of the scene using one or more of the video frames (“Various embodiments may include one or more of the following features: a neural network to detect, track and classify road users, a segmentation module to detect the traversable space for road users as well as their designated areas” at paragraph 0026, line 1); 
estimate one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes available. Thus, at each timestep of the module 312, the module 312 may update the predictions for the trajectories received” at paragraph 0064); and 
adjust one or more automated driving actions based on the estimated one or more intention probabilities (“If the one or more selected trajectories for a road user are in conflict with a trajectory of a vehicle, these trajectories may be communicated to the I2V communication module 314 to warn the vehicle based on the predicted trajectory 
Gutierrez et al. does not explicitly disclose a full resolution segmentation map.
Herman teaches a computer system in the same field of endeavor of vehicle image processing, comprising:
at least one processor configured to:
generate a full resolution segmentation map of the scene using one or more of the video frames (“An example of a convolutional neural network that can segment color video data of traffic scenes into is a full-resolution residual network (FRRN). FRRNs maintain a full resolution version of the input video data in parallel to a pooling/unpooling data stream in a convolutional neural network that includes feedback loops, that communicates with the full-resolution data stream to determine a limited number of segments in full-resolution video data. Configurations of convolutional neural networks can determine segments in input video data within natural region boundaries in color video data. Natural region boundaries include color boundaries, shape boundaries, and occlusion boundaries, for example. Boundary data can be combined with segment data at each layer of the full-resolution residual network to form high -resolution, segmented output video image, for example” at paragraph 0045, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a network as taught by Herman to generate the segmentation map for the system of Gutierrez et al. to “improve the performance of the convolutional neural networks over using non-color-corrected color video images for 

The Gutierrez et al. and Herman combination does not explicitly disclose estimating one or more intention probabilities based on physical interactions and causality between current behaviors and different elements of the scene.
Adeli-Mosabbeb et al. teaches a computer system in the same field of endeavor of pedestrian intent detection, comprising:
at least one processor configured to:
estimate one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs based on physical interactions and causality between current behaviors and different elements of the scene (“To model pedestrian actions and interactions with others through time, we connect pedestrian and context nodes between consecutive frames to further reasoning about the temporal relations” at paragraph 0018, line 19; “For example, an object to which the pedestrian is giving high attention should be associated with a heavier edge weight. As discussed above, a separate graph is generated for each pedestrian identified during the parsing stage” at paragraph 0033, last sentence; “pedestrian interactions with other objects in the scene, such as other pedestrians, vehicles, traffic signs, lights, and environmental factors like zebra-crossings. We argue that such factors can be uncovered through reasoning over the relationship across different objects in the scene. Therefore, we explore graph-
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize the intention modeling as taught by Adeli-Mosabbeb et al. in the system of the Gutierrez et al. and Herman combination “facilitates realistic visual reasoning to infer the intents, even in complex scenes” (Adeli-Mosabbeb et al. at paragraph 0023, last sentence).
Regarding claim 10, Gutierrez et al. discloses a system wherein the at least one processor is further configured to: 
estimate one or more behavior states based at least on the estimated pose of the VRUs, each behavior state corresponding to one of the detected VRUs (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4); and 
estimate future trajectories of the one or more VRUs using the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes 
Regarding claim 12, Gutierrez et al. disclose a system wherein the at least one processor is further configured to: 
estimate the one or more intention probabilities by:
estimating the one or more behavior states based at least on the estimated pose of the one or more VRUs using a multi-task model (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4; “multitasking and parallel processing may be advantageous” at paragraph 0163, line 10); and 
estimating the one or more intention probabilities based on the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes available. Thus, at each timestep of the module 312, the module 312 may update the predictions for the trajectories received” at paragraph 0064).
claim 13, Gutierrez et al. disclose a system wherein the at least one processor is further configured to: 
estimate the one or more behavior states by applying a neural network architecture to a continuous sequence of poses of each of the one or more VRUs to select a behavior state for the VRU among a plurality of predefined behavior states (“In various embodiments, for scenarios that remain relatively constant, a neural network may be trained to generate the trajectories, which would further accelerate the trajectory planning. A neural network approach may provide for high accuracy, efficiency, and speed of generating trajectories within a generative model” at paragraph 0061, line 1).
Regarding claim 14, Gutierrez et al. disclose a system wherein the at least one processor is further configured to: 
generate the segmentation map by applying a neural network architecture to one or more of the video frames to classify each pixel in the video frames as one of a plurality of predefined classes, each class corresponding to a segment in the segmentation map (“a neural network to detect, track and classify road users” at paragraph 0026, line 2).
Regarding claim 15, Gutierrez et al. disclose a wherein the at least one processor is further configured to: 
select at least one high-risk VRU from among the one or more VRUs based on the predicted behavior state and proximity of each VRU to the vehicle (“In some embodiments, a computing system of the vehicle 204 or a computing system connected to the vehicle 204 may superimpose the distribution of the predicted trajectory 202 and 
Regarding claim 16, the Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination discloses a computer system wherein the at least one processor is further configured to: 
estimate the one or more intention probabilities by analyzing social interactions between the one or more VRUs and one or more classes corresponding to objects in the segmentation map (“To model pedestrian actions and interactions with others through time, we connect pedestrian and context nodes between consecutive frames to further reasoning about the temporal relations” Adeli-Mosabbeb et al. at paragraph 0018, line 19; “For example, an object to which the pedestrian is giving high attention should be associated with a heavier edge weight. As discussed above, a separate graph is generated for each pedestrian identified during the parsing stage” Adeli-Mosabbeb et al. at paragraph 0033, last sentence; “pedestrian interactions with other objects in the scene, such as other pedestrians, vehicles, traffic signs, lights, and environmental factors like zebra-crossings. We argue that such factors can be uncovered through reasoning over the relationship across different objects in the scene. Therefore, we explore graph-based spatiotemporal modeling of the relationships in a recurrent framework” Adeli-Mosabbeb et al. at paragraph 0017, line 6).
Regarding claim 17, Gutierrez et al. discloses a computer-readable product including programmed instructions embodied therein and stored on a non-transitory computer readable medium (“As an example, a module, engine, block, unit, model, system, or logic may include one or more hardware components, such as a micro-
obtain a sequence of video frames corresponding to a scene external to the vehicle, wherein the sequence of video frames is captured using at least one of the one or more sensors (“As discussed above, the autonomous driving stack of a vehicle may utilize a variety of sensor data (e.g., 1358) generated by various sensors provided on and external to the vehicle. As an example, a vehicle 1205 may possess an array of sensors 1325 to collect various information relating to the exterior of the vehicle and the surrounding environment, vehicle system status, conditions within the vehicle, and other information usable by the modules of the vehicle's processing system 1310. For instance, such sensors 1325 may include global positioning (GPS) sensors 1368, light detection and ranging (LIDAR) sensors 1370, two-dimensional (2D) cameras 1372, three-dimensional (3D) or stereo cameras 1374” at paragraph 0103, line 1); 
detect one or more vulnerable road users in the sequence of video frames, wherein the detecting comprises estimating pose of each of the detected one or more VRUs (“Any suitable processes for road user detection and tracking may be implemented by circuitry of module 308, such as an open source or propriety framework that provides road user detection and tracking of a scene based on the sensor data 316. For example, Spencer people tracking (e.g., as provided at www.spencer.eu), OpenPose (e.g., as described in the paper "OpenPose: Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields" by Cao et al.), or other suitable tracking processes may be utilized by road user detection and tracking module 308. In various 
generate a segmentation map of the scene using one or more of the video frames (“Various embodiments may include one or more of the following features: a neural network to detect, track and classify road users, a segmentation module to detect the traversable space for road users as well as their designated areas” at paragraph 0026, line 1); 
estimate one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes available. Thus, at each timestep of the module 312, the module 312 may update the predictions for the trajectories received” at paragraph 0064); and 
adjust one or more automated driving actions based on the estimated one or more intention probabilities (“If the one or more selected trajectories for a road user are in conflict with a trajectory of a vehicle, these trajectories may be communicated to the I2V communication module 314 to warn the vehicle based on the predicted trajectory 
Gutierrez et al. does not explicitly disclose a full resolution segmentation map.
Herman teaches a method in the same field of endeavor of vehicle image processing, comprising:
generating, by the computer system, a full resolution segmentation map of the scene using one or more of the video frames (“An example of a convolutional neural network that can segment color video data of traffic scenes into is a full-resolution residual network (FRRN). FRRNs maintain a full resolution version of the input video data in parallel to a pooling/unpooling data stream in a convolutional neural network that includes feedback loops, that communicates with the full-resolution data stream to determine a limited number of segments in full-resolution video data. Configurations of convolutional neural networks can determine segments in input video data within natural region boundaries in color video data. Natural region boundaries include color boundaries, shape boundaries, and occlusion boundaries, for example. Boundary data can be combined with segment data at each layer of the full-resolution residual network to form high -resolution, segmented output video image, for example” at paragraph 0045, line 1).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a network as taught by Herman to generate the segmentation map for the system of Gutierrez et al. to “improve the performance of the convolutional neural networks over using non-color-corrected color video images for 

The Gutierrez et al. and Herman combination does not explicitly disclose estimating one or more intention probabilities based on physical interactions and causality between current behaviors and different elements of the scene.
Adeli-Mosabbeb et al. teaches a method in the same field of endeavor of pedestrian intent detection, comprising:
estimating, by the computer system, one or more intention probabilities using estimated pose of the one or more VRUs and the segmentation map, each intention probability corresponding to one of the detected one or more VRUs based on physical interactions and causality between current behaviors and different elements of the scene (“To model pedestrian actions and interactions with others through time, we connect pedestrian and context nodes between consecutive frames to further reasoning about the temporal relations” at paragraph 0018, line 19; “For example, an object to which the pedestrian is giving high attention should be associated with a heavier edge weight. As discussed above, a separate graph is generated for each pedestrian identified during the parsing stage” at paragraph 0033, last sentence; “pedestrian interactions with other objects in the scene, such as other pedestrians, vehicles, traffic signs, lights, and environmental factors like zebra-crossings. We argue that such factors can be uncovered through reasoning over the relationship across different objects in the scene. Therefore, we explore graph-based spatiotemporal modeling of the relationships in a recurrent framework” at paragraph 0017, line 6).

Regarding claim 18, Gutierrez et al. discloses a computer program product wherein the instructions further cause the one or more processors to: 
estimate one or more behavior states based at least on the estimated pose of the VRUs, each behavior state corresponding to one of the detected VRUs (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4); and 
estimate future trajectories of the one or more VRUs using the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes available. Thus, at each timestep of the module 312, the module 312 may update the predictions for the trajectories received” at paragraph 0064).
claim 19, Gutierrez et al. disclose a computer program product wherein the instructions further cause the one or more processors to: 
estimate the one or more behavior states based at least on the estimated pose of the one or more VRUs using a multi-task model (“In some embodiments, the road user intent prediction method may use path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories” at paragraph 0024, line 4; “multitasking and parallel processing may be advantageous” at paragraph 0163, line 10); and 
estimate the one or more intention probabilities based on the estimated one or more behavior states (“The predicted trajectories 330 are passed to the intent prediction module 312. Road user intent prediction module 312 may use the path planning as a generative model for hypothesis testing to identify likelihoods over possible trajectories. Thus, the intent prediction module 312 may utilize an observed trajectory of a road user to assign likelihoods to a plurality of goal positions and trajectories to those goal positions. In various embodiments, module 312 may utilize Bayesian inference to evaluate the likelihoods, where Bayesian inference is a method of statistical inference in which the probability of a prediction is updated as additional evidence becomes available. Thus, at each timestep of the module 312, the module 312 may update the predictions for the trajectories received” at paragraph 0064).
Regarding claim 20, Gutierrez et al. disclose a computer program product wherein the instructions further cause the one or more processors to: 
apply a neural network architecture to one or more of the video frames to classify each pixel in the video frames as one of a plurality of predefined classes, each class .

Claims 3 and 11 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Gutierrez et al., Herman and Adeli-Mosabbeb et al. as applied to claims 1 and 9 above, and further in view of Zhang et al. (US 2019/0171871).
Regarding claim 3, the Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination discloses a method wherein the steps of detecting, generating and estimating are performed using a neural network model by sequentially correlating the estimated pose of the one or more VRUs and their corresponding behavior state with a segmented scene in the vicinity of each of the VRUs (“a neural network to detect, track and classify road users” Gutierrez et al. at paragraph 0026, line 2; “In various embodiments, for scenarios that remain relatively constant, a neural network may be trained to generate the trajectories, which would further accelerate the trajectory planning. A neural network approach may provide for high accuracy, efficiency, and speed of generating trajectories within a generative model” Gutierrez et al. at paragraph 0061, line 1).
The Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination does not explicitly disclose that the neural network is a holistic deep learning neural network.
Zhang et al. teaches a method in the same field of endeavor of pose estimation, wherein the steps of detecting, generating and estimating are performed using a holistic deep learning neural network model by sequentially correlating the estimated pose of the one or more person and their corresponding behavior state with a segmented scene 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a holistic neural network as taught by Zhang et al. in the pose estimation of the Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination as “Considering the probability scores of other connecting segments is helpful because the end goal is to predict a pose structure in certain embodiments, rather than a joint location in isolation” (paragraph 0123, line 14).
Regarding claim 11, the Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination discloses a system wherein the steps of detecting, generating and estimating are performed using a neural network model by sequentially correlating the estimated pose of the one or more VRUs and their corresponding behavior state with a segmented scene in the vicinity of each of the VRUs (“a neural network to detect, track and classify road users” Gutierrez et al. at paragraph 0026, line 2; “In various embodiments, for scenarios that remain relatively constant, a neural network may be 
The Gutierrez et al., Herman and Adeli-Mosabbeb et al. combination does not explicitly disclose that the neural network is a holistic deep learning neural network.
Zhang et al. teaches a system in the same field of endeavor of pose estimation, wherein the steps of detecting, generating and estimating are performed using a holistic deep learning neural network model by sequentially correlating the estimated pose of the one or more person and their corresponding behavior state with a segmented scene in the vicinity of each of the people (“Although in the example above the keypoints are selected based on probability scores for the particular segment type connecting the joints of interest (e.g., the right femur connecting the right hip and right knee), the keypoint selection process, in particular embodiments, may also take a more holistic view and consider the probability scores of other connecting segments and joints. For example, to determine the location of the person's right knee, the selection process may consider the probability scores associated with the person's right tibia/fibula in addition to the right femur (both the right femur and the right tibia/fibula share the right knee as a common joint), as well as any other connected portions of the pose model (e.g., the left femur, spine, right shoulder, etc.)” at paragraph 0123, line 1). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the invention to utilize a holistic neural network as taught by Zhang et al. in the pose estimation of the Gutierrez et al., Herman and Adeli-Mosabbeb et al. .


Response to Arguments

	Summary of Remarks (@ response page labeled 11): “However, Gutierrez remains silent with respect to estimating, using a full-resolution segmentation map, each intention probability that corresponds to a VRU based on physical interactions and causality between current behaviors and different elements of the scene according to the combination of limitations (i) and (ii).”

	Examiner’s Response: This argument is moot in view of the newly cited Herman and Adeli-Mosabbeb et al. references.

	Summary of Remarks (@ response page labeled 14): “Rosman fails to cure the inadequacies of Gutierrez. In particular, Rosman relies entirely on correlating a location of an object to a latent map, and only captures “raw sensor data, such as object trajectories, poses, or position/location data (e.g., spatial coordinates).” See Rosman, paragraph [0028]. Thus, because Rosman only captures the spatial information of the object, and not information on the behaviors of the object, Rosman cannot anticipate estimating, using a full- resolution segmentation map, each intention probability 

	Examiner’s Response: This argument is moot in view of the newly cited Herman and Adeli-Mosabbeb et al. references.


Conclusion

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to KATRINA R FUJITA whose telephone number is (571)270-1574. The examiner can normally be reached Monday - Friday 9:30-5:30 pm ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sumati Lefkowitz can be reached on 5712723638. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/KATRINA R FUJITA/Primary Examiner, Art Unit 2662