Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 3-10, 12-19 rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3-10, 12-19 of U.S. Patent No. 11,257,231, (see table below). Although the claims at issue are not identical, they are not patentably distinct from each other because:
-- Claims 1, 10, and 17 of the instant Application, recite common subject matter with the patent claims 1, 10, and 17; 
-- Whereby claims 1, 10, and 17 of the instant application, which recite the open-ended transitional phrase “comprising”, do not preclude the additional elements recited    by patent claims 1, 10, and 17, and 
-- Whereby the elements of claims 1, 10, and 17 of the instant Application are fully anticipated by patent claim 1, 10, and 17.
Instant Application
U.S. Patent No. 11,257,231
1. A method for monocular depth/pose estimation in a camera agnostic network, the method comprising: 













projecting lifted 3D points onto an image plane according to a predicted ray vector based on a monocular depth model, a monocular pose model, and a camera center according to the camera agnostic network; and 

predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.
1. A method for monocular depth/pose estimation in a camera agnostic network, the method comprising: 

training a monocular depth model and a monocular pose model to learn monocular depth estimation and monocular pose estimation based on a target image and one or more context images from monocular video captured by the camera agnostic network; 

lifting 3D points from image pixels of the target image according to the one or more context images; 

projecting the lifted 3D points onto an image plane according to a predicted ray vector based on the monocular depth model, the monocular pose model, and a camera center according to the camera agnostic network; and 

predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.
3. The method of claim 2, in which training comprises self-supervised learning of an ego-motion model to predict rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
3. The method of claim 1, in which training comprises self-supervised learning of an ego-motion model to predict rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
4. The method of claim 2, in which training comprises incorporating intrinsic parameters and extrinsic parameters regarding a plurality of different cameras into a camera agnostic configuration of the monocular depth model and the monocular pose model.
4. The method of claim 1, in which training comprises incorporating intrinsic parameters and extrinsic parameters regarding the plurality of different cameras into a camera agnostic configuration of the monocular depth model and the monocular pose model.
5. The method of claim 4, in which the extrinsic parameters define a location and orientation of the plurality of different cameras with respect to a world frame, and the intrinsic parameters enable a mapping between camera coordinates and pixel coordinates in an image frame.
5. The method of claim 4, in which the extrinsic parameters define a location and orientation of the plurality of different cameras with respect to a world frame, and the intrinsic parameters enable a mapping between camera coordinates and pixel coordinates in an image frame.
6. The method of claim 1, further comprising estimating a pose transformation from a target image to one or more context images based on predicted rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
6. The method of claim 1, further comprising estimating a pose transformation from the target image to the one or more context images based on predicted rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
7. The method of claim 1, further comprising estimating a trajectory of an ego vehicle based on the warped target image.
7. The method of claim 1, further comprising estimating a trajectory of an ego vehicle based on the warped target image.
8. The method of claim 7, further comprising planning a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.
8. The method of claim 7, further comprising planning a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.
9. The method of claim 1, further comprising: scaling the predicted ray vector according to a predicted depth; and offsetting the predicted ray vector by the camera center.
9. The method of claim 1, further comprising: scaling the predicted ray vector according to a predicted depth; and offsetting the predicted ray vector by the camera center.
10. A non-transitory computer-readable medium having program code recorded thereon for monocular depth/pose estimation in a camera agnostic network, the program code being executed by a processor and comprising: 













program code to project lifted 3D points onto an image plane according to a predicted ray vector based on a monocular depth model, a monocular pose model, and a camera center according to the camera agnostic network; and 


program code to predict a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.

10. A non-transitory computer-readable medium having program code recorded thereon for monocular depth/pose estimation in a camera agnostic network, the program code being executed by a processor and comprising: 

program code to train a monocular depth model and a monocular pose model to learn monocular depth estimation and monocular pose estimation based on a target image and one or more context images from monocular video captured by the camera agnostic network; 

program code to lift 3D points from image pixels of the target image according to the one or more context images; 

program code to project the lifted 3D points onto an image plane according to a predicted ray vector based on the monocular depth model, the monocular pose model, and a camera center according to the camera agnostic network; and 

program code to predict a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.

12. The non-transitory computer-readable medium of claim 11, in which training comprises self-supervised learning of an ego-motion model to predict rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
12. The non-transitory computer-readable medium of claim 10, in which training comprises self-supervised learning of an ego-motion model to predict rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
13. The non-transitory computer-readable medium of claim 11, in which the program code to train comprises program code to incorporate intrinsic parameters and extrinsic parameters regarding a plurality of different cameras into a camera agnostic configuration of the monocular depth model and the monocular pose model, and in which the extrinsic parameters define a location and orientation of the plurality of different cameras with respect to a world frame, and the intrinsic parameters enable a mapping between camera coordinates and pixel coordinates in an image frame.
13. The non-transitory computer-readable medium of claim 10, in which the program code to train comprises program code to incorporate intrinsic parameters and extrinsic parameters regarding the plurality of different cameras into a camera agnostic configuration of the monocular depth model and the monocular pose model, and in which the extrinsic parameters define a location and orientation of the plurality of different cameras with respect to a world frame, and the intrinsic parameters enable a mapping between camera coordinates and pixel coordinates in an image frame.
14. The non-transitory computer-readable medium of claim 10, further comprising estimating a pose transformation from a target image to one or more context images based on predicted rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
14. The non-transitory computer-readable medium of claim 10, further comprising estimating a pose transformation from the target image to the one or more context images based on predicted rigid transformations between the target image and the one or more context images captured by the camera agnostic network, in which the one or more context images represent adjacent frames of monocular video.
15. The non-transitory computer-readable medium of claim 10, further comprising: program code to estimate a trajectory of an ego vehicle based on the warped target image; and program code to plan a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.
15. The non-transitory computer-readable medium of claim 10, further comprising: program code to estimate a trajectory of an ego vehicle based on the warped target image; and program code to plan a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.
16. The non-transitory computer-readable medium of claim 10, further comprising: program code to scale the predicted ray vector according to a predicted depth; and program code to offset the predicted ray vector by the camera center.
16. The non-transitory computer-readable medium of claim 10, further comprising: program code to scale the predicted ray vector according to a predicted depth; and program code to offset the predicted ray vector by the camera center.

17. A system for monocular depth/pose estimation in a camera agnostic network, the system comprising: 













a pose network to project lifted 3D points onto an image plane according to a predicted ray vector based on a monocular depth model, a monocular pose model, and a camera center according to the camera agnostic network; and 

a view synthesis block to predict a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.

17. A system for monocular depth/pose estimation in a camera agnostic network, the system comprising: 

a depth network to train a monocular depth model and a monocular pose model to learn monocular depth estimation and monocular pose estimation based on a target image and one or more context images from monocular video captured by the camera agnostic network; 

a pose network to lift 3D points from image pixels of the target image according to the one or more context images and 

to project the lifted 3D points onto an image plane according to a predicted ray vector based on the monocular depth model, the monocular pose model, and a camera center according to the camera agnostic network; and 

a view synthesis block to predict a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network.

18. The system of claim 17, further comprising an ego perception module to estimate a trajectory of an ego vehicle based on the warped target image.
18. The system of claim 17, further comprising an ego perception module to estimate a trajectory of an ego vehicle based on the warped target image.
19. The system of claim 18, further comprising a planner module to plan a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.
19. The system of claim 18, further comprising a planner module to plan a trajectory of the ego vehicle according to an estimated trajectory of the ego vehicle.


Allowable Subject Matter
Claims 1, 3-10, 12-19 would be allowable if amended or if a timely filing terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d), to overcome the nonstatutory double patenting rejection, set forth in this Office action.


The following is a statement of reasons for the indication of allowable subject matter:  
-- Claims 1, 10, and 17 are allowable over the prior art of record.
-- Claims 2-9 are allowable in view of their dependency from claim 1
-- Claims 11-16 are allowable in view of their dependency from claim 10
-- Claims 18-20 are allowable in view of their dependency from claim 17

With respect to claim 1, the prior art of record, alone or in reasonable combination, does not teach or suggest, the following limitation(s), (in consideration of the claim as a whole):  
“projecting the lifted 3D points onto an image plane according to a predicted ray
vector based on the monocular depth model, the monocular pose model, and a camera center according to the camera agnostic network; and predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network”

	The closest prior art of record, Shang et al, (“Unsupervised Learning of Depth and Pose Estimation based on Continuous Frame Window, 2018 International Joint Conference on Neural Networks (IJCNN), IEEE, Pages 1-8), discloses a method for monocular depth/pose estimation in a camera, (see at least: Figures 2 and 3), comprising:
training a monocular depth model and a monocular pose model to learn monocular depth estimation and monocular pose estimation based on a target image and one or more context images from monocular video captured by the camera, (see at least: Page 3, left-hand-column, Fig. 2, where depth CNN corresponds to the monocular depth model, and pose CNN corresponds to the monocular pose model, and depth network takes only the first image (Ii), [i.e., target image], as input and outputs a per-pixel depth map (Dx), and the continuous frame window takes images (e.g., (Ii), (Ii+1), (Ii+2) as input, [i.e., one or more context images from monocular video], through the pose network, outputs the relative camera pose matrices of adjacent frames]. Note the system shown in Fig. 2, corresponds to camera network);
projecting a 3D points onto an image plane according to a predicted depth and camera pose based on the monocular depth model, the monocular pose model, and a camera model, (see at least: section III-B, Page 4, left-hand-column, “Formula 2”, and Fig. 4, Our camera model is the pinhole model. So we can use K denote the camera intrinsic matrix. We denote the transformation matrices of the two images to the world coordinates as T(1→w) and T(2→w), and denote the homogeneous coordinates of a pixel in the first image as x1, [i.e., 3D point(s)] . We can obtain x1’s projected coordinates onto second image x2 by x2 ∼ KT(2→w)T−1 (2→w)K−1 ˆD1(x1)x1. Exploiting the Eq.2, we can project the pixels on the target image onto other images. Further, Fig. 4, discloses projecting each point (e.g., x1) in the first image onto the other image based on the predicted depth and camera pose and then use bilinear interpolation to obtain the value of the warped image (ˆI2) at location (x1), [i.e., projecting the 3D points onto an image plane according to the predicted depth and camera pose based on the monocular depth model, and the monocular pose mode, and implicitly according to the camera model, “pinhole model”); and
predicting a warped target image from a predicted depth map of the monocular depth model, and a projection of the 3D points according to the camera model, (see at least: section III-B, Page 4, left-hand-column, the outputs of both models have then used to inverse warp images to reconstruct the target image, and the photometric reconstruction loss is used for training the CNNs; and section III-C, Page 4, right-hand-column, Fig. 4, for each point (e.g., x1) in the first image, we project it onto the other image based on the predicted depth and camera pose and then use bilinear interpolation to obtain the value of the warped image (ˆI2) at location (x1); [i.e., predicting a warped target image from a predicted depth map of the monocular depth model, and a projection of the 3D points according to the camera]).
	However, while disclosing projecting a 3D points onto the other image based on the predicted depth and camera pose and then use bilinear interpolation to obtain the value of the warped image (ˆI2) at location (x1); Shang et al fails to teach or suggest, either alone or in combination with the other cited references, the projecting the lifted 3D points onto an image plane according to a predicted ray vector based on the monocular depth model, the monocular pose model, and a camera center according to the camera agnostic network; and predicting a warped target image from a predicted depth map of the monocular depth model, a ray surface of the predicted ray vector, and a projection of the lifted 3D points according to the camera agnostic network”

A further prior art of record, Furukawa et al, (US-Patent 8,260,007), discloses a system, comprises an alignment module configured to map each pixel of a depth map, to a corresponding pixel of a map tile, [i.e., one or more context images]; a transform module for adjusting the depth value of each pixel of the depth map based on a corresponding terrain elevation value, (see at least: col. 1, lines 33-38). The system may comprise also a reconstruction module configured to generate a 3D point for each pixel of the depth tile; and generating an elevation of each of these 3D points, [i.e., lifting 3D points], based on an adjusted depth value of a corresponding pixel of the depth tile, [i.e., according to the one or more context images]. For example, an adjusted depth value may provide the elevation of a corresponding 3D point, [i.e., lifting 3D points from image pixels of the target image according to the one or more context images]), (see at least: col. 6, lines 39-51, and Fig. 5A); but fails to teach or suggest, either alone or in combination with the other cited references, the above limitations (as combined with the other claimed limitations).

Another prior art of record, Xu et al, (“Unsupervised Ego-Motion and Dense Depth Estimation with Monocular Video, 2018 18th IEEE International Conference on Communication Technology, Pages 1306-1310), discloses training with monocular images a neural networks DepthNet  (for  monocular depth  estimation)  and PoseNet (for ego-motion estimation), (see at least: Abstract, and section III, Fig. 1); but fails to teach or suggest, either alone or in combination with the other cited references, the above limitations (as combined with the other claimed limitations).

Regarding claim 10, claim 10 recites substantially similar limitations as set forth in claim 1. As such, claim 10 is in condition for allowance, for at least similar reasons, as stated above.

Regarding claim 17, claim 17 recites substantially similar limitations as set forth in claim 1. As such, claim 17 is in condition for allowance, for at least similar reasons, as stated above.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AMARA ABDI whose telephone number is (571)272-0273. The examiner can normally be reached 9:00am-5:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on (571) 272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/AMARA ABDI/Primary Examiner, Art Unit 2668                                                                                                                                                                                                        12/09/2022