Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Response to Arguments
Applicant’s arguments with respect to claim(s) 1, 8,  and 15 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. Examiner relies on US 2018/0268519 A1 to Liebenow et al., hereinafter, “Liebenow” to teach the amended claim limitations. Accordingly, THIS ACTION IS MADE FINAL.  
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-3, 5-10, 12-17, 19 and 20  is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2020/0160546 A1 to Gu et al., hereinafter, “Gu” in view of US 2018/0268519 A1 to Liebenow et al., hereinafter, “Liebenow”.
Claim 1. A method, comprising: generating a first warped image based on a pose and a depth estimated from a current image and a previous image in a sequence of images captured by a camera of the agent; Gu [0005] teaches a system and method are disclosed for estimating depth from image frames captured with a monocular image sensor (e.g., RGB).

Gu [0006] teaches a measured DPV is generated for the reference frame by warping the extracted features from the source frame to the reference frame for each candidate depth and matching the warped source frame features with the reference frame features.

Gu [0007] teaches the warp function applied to the source frame features is based on relative camera pose information related to a difference between a first position of the image sensor associated with the reference frame and a second position of the image sensor associated with the particular source frame. Generating the measured DPV includes applying a softmax function to a sum of differences between features from the reference frame and warped features from each of the neighboring source frames.

Gu [0165] teaches the warp function 712 generates a warped version of the features 708 for a corresponding source frame by sampling the features 204 for the source frame based on the relative camera pose information 706. [0172-0180]

estimating a motion of dynamic object between the previous image and the target image; Gu [0149] teaches the system 200 can be adapted, when processing a video stream, to treat the features extracted at time t as hidden state that can be updated as subsequent image frames are captured in the sequence of image frames… if the camera motion can be tracked, a measured DPV for a next frame can be predicted based on the current state of the measured DPV.

and controlling an action of an agent based on the second warped image. Gu [0205] teaches autonomous vehicles can benefit from more accurate depth estimation in order to improve object avoidance algorithms.

Gu fails to explicitly teach updating the first warped image based on the estimated motion to generate a second warped image. However, Liebenow, in the field of three-dimensional scene reconstruction, teaches updating the first warped image based on the estimated motion to generate a second warped image; [0022] teaches estimating a first pose for a first warp of the application frame at a first estimated display time. The method further includes performing a first warp of the application frame using the application pose and the estimated first pose to generate a first warped frame. Moreover, the method includes estimating a second pose for a second warp of the first warped frame at a second estimated display time. In addition, the method includes performing a second warp of the first warp frame using the estimated second pose to generate a second warped frame.

Liebenow [0052] teaches the display system 204 presents a sequence of frames at high frequency that provides the perception of a single coherent scene.

Liebenow [0053] teaches the display system 204 may be monocular or binocular.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify generating a first warped image based on a pose and a depth estimated from a current image and a previous image in a sequence of images captured by a camera of the agent by Gu with Liebenow’s teaching of updating the first warped image based on the estimated motion to generate a second warped image. One would have been motivated to perform this combination due to the fact that it allows one to alleviate visual artifacts/anomalies/glitches that can detract from the immersiveness and realism of MR systems (Liebenow [0003-0015]). In combination, Gu is not altered in that Gu continues to warp frames to reconstruct three-dimensional scene. Liebenow's teachings perform the same as they do separately of reconstructing a three-dimensional scene (virtual/mixed reality).
Therefore one of ordinary skill in the art, such as an individual working in the field of three-dimensional scene reconstruction could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 1.

Claim 2. The method of claim 1, in which the camera comprises a monocular camera. Gu [0005] teaches a system and method are disclosed for estimating depth from image frames captured with a monocular image sensor (e.g., RGB).

Gu [0032] teaches techniques described herein utilizing neural networks to estimate depth information from a sequence of images captured using a monocular image sensor.

Claim 3. The method of claim 2, in which the pose corresponds to an ego-motion of the monocular camera. Gu [0012] teaches the system further includes an image sensor configured to capture the sequence of input image data. In an embodiment, the system also includes a positional sensing subsystem configured to generate the relative camera pose information. In some embodiments, the positional sensing subsystem includes an inertial measurement unit.  

Claim 5. The method of claim 1, in which each image of the sequence of images is a two-dimensional image.  Gu [0032] teaches techniques described herein utilizing neural networks to estimate depth information from a sequence of images captured using a monocular image sensor.

Claim 6. The method of claim 1, in which the first warped image and the second warped image are three-dimensional images.  Gu [0205] teaches a user with a common smart phone with a single camera can capture and reconstruct a 3D model (e.g., a point cloud) simply by capturing a video of the environment around the user.

Claim 7. The method of claim 1, further comprising: determining a first photometric loss between the target image and the first warped image; determining a second photometric loss between the target image and the second warped image; training a scene reconstruction system based on the first photometric loss and the second photometric loss. Gu [0033] teaches the system is composed of three neural network modules: D-Net, K-Net, and R-Net. The negative log-likelihood (NLL) loss over the depth is used to train the entire network in end-to-end fashion. The first neural network module, D-Net, can be used to extract image features from a single image frame. The extracted image features can be used to directly estimate a DPV for the image frame corresponding to a non-parametric volume represented by a frustum composed of voxels originating at the image sensor. However, improved confidence in the estimate can be realized by combining the extracted features for a reference frame and corresponding extracted features for at least one source frame neighboring the reference frame, the features of each source frame warped by a warping function to match intrinsic parameters of the reference frame. The extracted features for the reference frame and the warped features for the at least one source frame are filtered using a softmax function to generate a measured DPV for the reference frame that is based on a plurality of image frames within a time interval rather than a single image frame (or a stereo image frame) captured at a particular instant in time.

Gu [0178] teaches One problem with directly applying Bayesian filtering is that both correct and incorrect information are propagated over time. Thus, artifacts such as specular highlights in one frame can propagate error introduced by the measured DPV 710 over multiple frames. In another example, occlusions or dis-occlusions could cause the estimated depth at object boundaries to change abruptly from frame to frame. One solution is to utilize damping to reduce the weight of the predicted DPV 1010 to reduce the propagation of incorrect information from previous frames.

Claim 8. It differs from claim 1 in that it is an apparatus performing the method of claim 1. Therefore claim 8 has been analyzed and reviewed in the same way as claim 1. See the above analysis. 

Claim 9. It differs from claim 2 in that it is an apparatus performing the method of claim 2. Therefore claim 8 has been analyzed and reviewed in the same way as claim 2. See the above analysis. 

Claim 10. It differs from claim 3 in that it is an apparatus performing the method of claim 3. Therefore claim 10 has been analyzed and reviewed in the same way as claim 3. See the above analysis. 

Claim 12. It differs from claim 5 in that it is an apparatus performing the method of claim 5. Therefore claim 12 has been analyzed and reviewed in the same way as claim 5. See the above analysis. 

Claim 13. It differs from claim 6 in that it is an apparatus performing the method of claim 6. Therefore claim 13 has been analyzed and reviewed in the same way as claim 6. See the above analysis. 

Claim 14. It differs from claim 7 in that it is an apparatus performing the method of claim 7. Therefore claim 14 has been analyzed and reviewed in the same way as claim 7. See the above analysis. 

Claim 15. It differs from claim 1 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 1. Therefore claim 15 has been analyzed and reviewed in the same way as claim 1. See the above analysis. 

Claim 16. It differs from claim 2 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 2. Therefore claim 16 has been analyzed and reviewed in the same way as claim 2. See the above analysis. 

Claim 17. It differs from claim 3 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 3. Therefore claim 17 has been analyzed and reviewed in the same way as claim 3. See the above analysis. 

Claim 19. It differs from claim 5 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 5. Therefore claim 19 has been analyzed and reviewed in the same way as claim 5. See the above analysis. 

Claim 20. It differs from claim 6 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 6. Therefore claim 20 has been analyzed and reviewed in the same way as claim 6. See the above analysis. 

Claims 4, 11 and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2020/0160546 A1 to Gu et al., hereinafter, “Gu” in view of US 2018/0268519 A1 to Liebenow et al., hereinafter, “Liebenow” and in further view of US 2018/0205941 A1 to Kopf et al., hereinafter, “Kopf”.
Claim 4. Gu  and Liebenow fails to explicitly teach generating the first warped image based on an inverse warp of the current image and the previous image. However Kopf, in the same field of three-dimensional scene reconstruction, teaches further comprising generating the first warped image based on an inverse warp of the current image and the previous image. Kopf [0004] teaches image reconstruction system generates a three-dimensional image from a plurality of two-dimensional input images. A plurality of input images of a scene is received in which the input images are taken from different vantage points. The input images may have varying amounts of overlap with each other and varying camera orientations. The plurality of input images is processed to generate a sparse reconstruction representation of the scene. The sparse reconstruction representation including a sparse point cloud specifying locations of a plurality of points that correspond to three-dimensional locations of surfaces of objects in the scene… each of the respective dense reconstruction representations include a respective depth image for a corresponding input image in which the depth image includes both color and depth information. Front surfaces of the depth images are projected using a forward depth test to generate a plurality of front-warped images. Back surfaces of the depth images are projected using an inverted depth test to generate a plurality of back-warped images.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to modify generating a first warped image based on a pose and a depth estimated from a current image and a previous image in a sequence of images captured by a camera of the agent by Gu with Kopf’s teaching of generating the first warped image based on an inverse warp of the current image and the previous image. One would have been motivated to perform this combination due to the fact that it allows one to accurately avoid obstacles by moving straight or straight backwards after colliding with an obstacle. In combination, Gu is not altered in that Gu continues to warp frames to reconstruct three-dimensional scene. Kopf's teachings perform the same as they do separately of reconstructing a three-dimensional scene.
Therefore one of ordinary skill in the art, such as an individual working in the field of three-dimensional scene reconstruction could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 4.

Claim 11. It differs from claim 4 in that it is an apparatus performing the method of claim 4. Therefore claim 11 has been analyzed and reviewed in the same way as claim 4. See the above analysis. 

Claim 18. It differs from claim 4 in that it is a non-transitory computer-readable medium having program code recorded thereon, the program code executed by a processor performing the method of claim 4. Therefore claim 18 has been analyzed and reviewed in the same way as claim 4. See the above analysis. 

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Deep Video Super-Resolution Network Using Dynamic Upsampling Filters Without Explicit Motion Compensation to Jo et al. [Abstract] teaches While many deep learning based VSR methods have been proposed, most of them rely heavily on the accuracy of motion estimation and compensation. We introduce a fundamentally different framework for VSR in this paper. We propose a novel end-to-end deep neural network that generates dynamic upsampling filters and a residual image, which are computed depending on the local spatio-temporal neighborhood of each pixel to avoid explicit motion compensation. With our approach, an HR image is reconstructed directly from the input image using the dynamic upsampling filters, and the fine details are added through the computed residual. Our network with the help of a new data augmentation technique can generate much sharper HR videos with temporal consistency, compared with the previous methods., [Figure 1.] ×4 VSR for the scene ferriswheel. To visualize the temporal consistency in 2D, we also plot the transition of the dotted orange horizontal scanline over time in the orange box with the x−t axis. We can observe that our method produces much sharper and temporally consistent HR frames compared with VSRnet [16].
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681. The examiner can normally be reached 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661