DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
2.	This office action is in response to U.S. Patent Application No.: 17/085,081 filed on 10/30/2020 with effective filing date 8/30/18. Claims 1-17 are pending.
Claim Rejections - 35 USC § 103
3.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

4.	The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
5.	Claims 1-4, 8-12, & 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Shen et al. US 2018/0286061 A1 in view of Godard et al. US 2019/0213481 A1. 
Per claims 1 & 16, Shen et al. discloses a method for generating a depth map, comprising: training a convolutional neural network for depth map estimation (para: 35, e.g. the estimates of planarity and estimates of edges are determined using models trained using images of other objects having known planes and edges; determine 3D geometry values (e.g., depths and normals) of pixels in the common planar regions based on a planar region); and generating a depth map for an image using the trained convolutional neural network (para: 36, e.g. evaluate potential depth/normal combinations for all pixels of the image and select the best combination; use of techniques of the invention can significantly improve the predicted depths and normals (e.g., depth maps and normal maps) provided by existing 3D geometry estimation techniques). 
Shen et al. fails to explicitly disclose wherein the convolutional neural network is trained using a common loss function which is obtained based on left and right images of stereo pair images, reconstructed left and right images, disparity maps for the left and right images, reconstructed disparity maps for the left and right images, and auxiliary left and right images.
Godard et al. however in the same field of endeavor teaches wherein the convolutional neural network is trained using a common loss function which is obtained based on left and right images of stereo pair images (para: 36, e.g. the training module 13 trains the convolutional neural network (CNN) module 11 based on binocular stereo pairs of images 15; the training module 13 optimises a loss function implemented by a loss module 19 of the CNN module 11 and as a result, trains the disparity predictor 9 to accurately and efficiently generate the predicted binocular disparity map directly from colour pixel values of a single source image), reconstructed left and right images, disparity maps for the left and right images, reconstructed disparity maps for the left and right images, and auxiliary left and right images (para: 41, e.g. an additional optimization goal of the training module 13 is to train the CNN 11 to reconstruct the corresponding left and right views by learning the disparity maps that can shift the pixels to minimize an image reconstruction error. In this way, given training images from a calibrated pair of binocular cameras, the image processing system 3 learns a function that is able to reconstruct an image given the other view, and in so doing, generates a trained model (i.e. the CNN 11) that enables prediction or estimation of the shape of the scene that is being imaged). 
Therefore, in view of disclosures by Godard et al., it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to combine Shen et al. and Godard et al. in order to train the module to optimizes a loss function implemented by a loss module of the CNN module. 
Per claims 2 & 10, Shen et al. discloses the method of claim 1, wherein the disparity maps for the left and right images are obtained by applying an affine transform to inverse depth maps for the left and right images based on parameters of a camera which is used for obtaining the stereo pair images (para: 85, e.g. FIG. 14 is a block diagram illustrating bilateral affinity).
Per claims 3 & 11, Shen et al. discloses the method of claim 2, wherein the parameters of the camera are obtained by processing high-level feature maps for the left and right images based on the convolutional neural network (para: 88, e.g. FIG. 15 illustrates selection of the top 6 Eigen vectors as the feature f for each pixel based on an affinity map).
Per claims 4 & 12, Godard et al. further teaches the method of claim 1, wherein the reconstructed left and right images, the reconstructed disparity maps for the left and right images, and the auxiliary left and right images are obtained by performing bilinear-interpolation sampling for the left and right images based on the disparity maps for the left and right images (para: 11 & 47, e.g. the cost function may further include a reconstructed appearance matching component to minimize an image reconstruction error between the reconstructed image and the corresponding input image. Sampling may comprise bilinear interpolation). 
Per claims 8 & 17, Shen et al. discloses a method for generating a depth map, comprising: training a convolutional neural network for depth map estimation (para: 35, e.g. the estimates of planarity and estimates of edges are determined using models trained using images of other objects having known planes and edges; determine 3D geometry values (e.g., depths and normals) of pixels in the common planar regions based on a planar region); and generating a depth map of an image using the trained convolutional neural network (para: 36, e.g. evaluate potential depth/normal combinations for all pixels of the image and select the best combination; use of techniques of the invention can significantly improve the predicted depths and normals (e.g., depth maps and normal maps) provided by existing 3D geometry estimation techniques). 
Shen et al. fails to explicitly disclose wherein the convolutional neural network is trained using a common loss function which is obtained based on corrected left and right images for left and right images of stereo pair images, reconstructed left and right images, correction maps for the left and right images, disparity maps for the left and right images, reconstructed disparity maps for the left and right images, and auxiliary left and right images.
Godard et al. however in the same field of endeavor teaches wherein the convolutional neural network is trained using a common loss function which is obtained based on corrected left and right images for left and right images of stereo pair images (para: 36, e.g. the training module 13 trains the convolutional neural network (CNN) module 11 based on binocular stereo pairs of images 15; the training module 13 optimises a loss function implemented by a loss module 19 of the CNN module 11 and as a result, trains the disparity predictor 9 to accurately and efficiently generate the predicted binocular disparity map directly from colour pixel values of a single source image), reconstructed left and right images, correction maps for the left and right images (para: 45, e.g. the CNN 11 is trained to find a correspondence field, which in this embodiment is the predicted left-to-right disparity map (d.sup.r), that when applied to the left view image 15a enables a right view projector 415a of the CNN 11 to reconstruct a projected right view image (or vice versa)), disparity maps for the left and right images, reconstructed disparity maps for the left and right images, and auxiliary left and right images  (para: 41, e.g. an additional optimization goal of the training module 13 is to train the CNN 11 to reconstruct the corresponding left and right views by learning the disparity maps that can shift the pixels to minimize an image reconstruction error. In this way, given training images from a calibrated pair of binocular cameras, the image processing system 3 learns a function that is able to reconstruct an image given the other view, and in so doing, generates a trained model (i.e. the CNN 11) that enables prediction or estimation of the shape of the scene that is being imaged).
Therefore, in view of disclosures by Godard et al., it would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention was made to combine Shen et al. and Godard et al. in order to train the module to optimizes a loss function implemented by a loss module of the CNN module. 
Per claim 9, Godard et al. further teaches the method of claim 8, wherein the corrected left and right images are obtained by using the correction maps for the left and right images (para: 41).

Allowable Subject Matter
6.	Claims 5-7 & 13-15 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion
7.	The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
	Yang et al. US 2019/0387209 A1, e.g. the system includes a deep virtual stereo odometry 
module that receives the camera data from the monocular camera and the depth map from the stacked architecture.  The calculation module initializes a keyframe of the camera data using the depth map and determines a photometric error based on a set of observation.
	Socher US 2018/0096219 A1, e.g. convolutional neural network is trained against the 

	Bhardwaj et al. US 9,275,078 B2, e.g. The machine calculates visual descriptors and 
corresponding depth descriptors from this information.  The machine then generates a mapping that correlates these visual descriptors with their corresponding depth descriptors. 

8.	Any inquiry concerning this communication or earlier communications from the examiner should be directed to IRFAN HABIB whose telephone number is (571)270-7325.  The examiner can normally be reached on Mon-Th 9AM-7PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jay Patel can be reached on 5712722988.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/Irfan Habib/Examiner, Art Unit 2485