DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
2.	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
3.	Claims 19 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Aseem Behl et al. “Bounding boxes, Segmentation and Object Co-ordinates: How important is recognition for 3D .
	Regarding claim 19, Behl discloses a computing system comprising: one or more processors; one or more tangible non-transitory computer readable media that store a machine-learned scene flow model, the machine-learned scene flow model comprising:
one or more estimates based at least in part on a plurality of representations of an environment (I. Introduction, We compute these cues using CNNs trained on a newly annotated dataset of stereo images and integrate them into a CRF-based model for robust 3D scene flow estimation - an approach we term Instance Scene Flow. We analyze the importance of each recognition cue in an ablation study and observe that the instance segmentation cue is by far strongest, in our setting, stereo is plurality of representation of an environment), wherein the one or more estimates comprise one or more object instance segment estimates, one or more optical flow motion estimates, and one or more stereo motion estimates (Fig.2, the instant segmentation and bounding boxes, 3. Method, right column, we train a CNN to predict object coordinates for car instances. Finally, we integrate the bounding box, instance and object coordinates cues into a slanted plane formulation and analyze the importance of each cue for the scene flow estimation task. The remainder of this section is structured as follows, 3.2. We use a modified version of the encoder-decoder style CNN proposed in [32] for estimating the object coordinates at each pixel, Fig. 2, Work flow for our approach. Note that each intermediate step uses as input all of the previous results. Given the four RGB input images (t/t+1, left/right) we compute 3D points (XYZ) for each pixel. For each of the four RGB, XYZ image-blocks we obtain instance segmentations, alongside bounding boxes. The M instances are processed individually to obtain object coordinates for each instance, using our object coordinates CNN. Finally, all this information is integrated into our Instance Scene Flow method (ISF) to produce the final output); and one or more machine-learned inference models configured to generate one or more three-one or more machine-learned inference models are configured to minimize energy associated with at least one of a photometric error, a rigid motion error, and a flow consistency (3.3. Scene flow model, right column, Energy model: Given the left and right input images of two consecutive frames (Fig. 3), our goal is to infer the 3D geometry of each super pixel in the reference view, the association to objects and the rigid body motion of each object. We formulate the scene flow estimation task as an energy minimization problem comprising data, smoothness and instance terms as shown in equation 1, Fig. 5, scene flow error, 4.1. Effect of recognition Granularity, we study the impact of different levels of recognition granularity for estimating the 3D scene flow of dynamic (i.e., foreground) objects. In addition to the recognition cues, we use sparse optical flow from sparse Discrete Flow correspondences [23] and dense disparity maps from SPS-stereo [45] for both rectified frames. We obtain the super pixel boundaries using StereoSLIC [44]. Table 1 provides disparity, flow (FI) and scene flow (SF) error averaged over the validation set for optical scene flow). Behl does not explicitly disclose a plurality of the machine learning models to generate the estimates. However, in same field of endeavor, Tran teaches learning good features for visual odometry in which Fig. 1, teaches the plurality of machine learning models as shown, the optical flow CNN , 122 and semantic segmentation CNN, 126 to generate the optical flow , 123 and semantic segmentation CNN, 127. Therefore, it would have been obvious to one of ordinary skilled in the art before thee effective filing date of the claimed invention to combine the teachings of Tran with the Behl, as a whole, to use the plurality of the machine learned models to generate the multiple visual cues using optical flow and semantic segmentation, the motivation is to effectively estimate the optical scene flow. 
 	Regarding claim 20, Behl further discloses the computing system, wherein the photometric error is based at least in part on a similarity of the plurality of representations over time (3.3 3.3 scene flow model, the data cost compares the appearance at pixel p in reference image with the ∈ V. In our experiments, we use Census descriptors [46] which are robust to simple photometric variations), the rigid motion error is based at least in part on one or more differences between the one or more optical flow motion estimates and the one or more stereo motion estimates (3.3. scene flow model, Energy Model: Given the left and right input images of two consecutive frames (Fig. 3), our goal is to infer the 3D geometry of each super pixel in the reference view, the association to objects and the rigid body motion of each object. We formulate the scene flow estimation task as an energy minimization problem comprising data, smoothness and instance terms, to guide the optimization process and overcome local minima we additionally add a robust loss. This loss measures the difference with respect to sparse Discrete Flow correspondences [23] for the flow terms (v = 2, 3) and depth estimates from SPS-stereo [45] for the stereo term (v = 1)), and the flow consistency is based at least in part on one more two-dimensional rigid flow estimates with respect to the one or more optical flow motion estimates (3.3 scene flow model, The data cost compares the appearance at pixel p in reference image with the appearance at pixel q in the target view v ∈ V. In our experiments, we use Census descriptors [46] which are robust to simple photometric variations [22, 41, and 45]. To guide the optimization process and overcome local minima we additionally add a robust loss. This loss measures the difference with respect to sparse Discrete Flow correspondences [23] for the flow terms (v = 2, 3) and depth estimates from SPS-stereo [45] for the stereo term (v = 1)).

Allowable Subject Matter
4.	Claims 1-18 are allowed. 
 	The closest prior arts, as a whole, do not disclose motion flow estimation, comprising: accessing, by a computing system comprising one or more computing devices, scene data comprising a plurality of representations of an environment over a first set of time intervals; generating, by the computing system, a plurality of extracted visual cues based at least in part on 
 	The closest prior arts, as a whole, do not disclose accessing training data comprising a plurality of representations of an environment over a first set of time intervals, wherein the plurality of representations comprises a first plurality of representations of the environment from a first perspective and a second plurality of representations from a second perspective; generating a plurality of extracted visual cues based at least in part on the plurality of representations and a plurality of machine-learned feature extraction models; encoding the plurality of extracted visual cues using a plurality of energy functions; determining one or more three-dimensional motion estimates of one or more object instances over a second set of time intervals that are subsequent to the first set of time intervals based at least in part on the plurality of energy functions and one or more machine-learned inference models; determining a loss associated with one or more comparisons of the one or more three- dimensional motion estimates of the one or more object instances relative to one or more ground- truth locations of the one or more object instances; and adjusting one or more parameters of the plurality of machine-learned feature extraction models based at least in part on the loss, render claim 12 allowable over prior arts.
Conclusion
5.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Taylor et al. (US 9,679,227) discloses system and method for detecting features in aerial imaging using disparity mapping and segmentation techniques.
Levkova et al. (US 2018/0157918) discloses system and method for estimating vehiclular motion based on monocular video data.
Luo et al. (US 2019/0145765) discloses three dimensional object detection.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DHAVAL V PATEL whose telephone number is (571)270-1818. The examiner can normally be reached Monday to Friday (8:00am-4:30pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Shuwang Liu can be reached on 571-272-3036. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information 





/DHAVAL V PATEL/Primary Examiner, Art Unit 2631                                                                                                                                                                                                        1/18/2022