DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-2, 4, 6, 9-10, 12 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber et al. (US 20150262412 A1) in view of Karam et al. (US 20150269723 A1).
Regarding claims 1 and 9. Gruber et al. (US 20150262412 A1) discloses A cross reality system (figure 1, [0035] User Equipment (UE) 100 capable of implementing computer vision applications, including augmented reality effects), comprising: 
a head-mounted display device having a display system ([0036] a head mounted display (HMD), which may be used to display live and/or real world images); 
a computing system in operable communication with the head-mounted display ([0036] UE 100 may take the form of a wearable computing device); 
a plurality of camera sensors in operable communication with the computing system ([0039] UE 100 may include one or more cameras or image sensors 110 (hereinafter referred to as "camera(s) 110")); 
wherein the computing system is configured to estimate depths of features in a scene from a plurality of multi-view images captured by the camera sensors ([0039] UE 100 may include one or more cameras or image sensors 110; [0045] camera(s) 110 may include depth sensors, which may provide "depth information"; [0046] a depth sensor may take the form of a passive stereo vision sensor, which may use two or more cameras to obtain depth information for a scene) by a process comprising: 
obtaining a multi-view images, including an anchor image of the scene and a set of reference images of a scene within a field of view of the camera sensors from the camera sensors ([0047] images received from camera(s) 110; [0046] a depth sensor may take the form of a passive stereo vision sensor, which may use two or more cameras to obtain depth information for a scene); 
passing the anchor image and reference images through a shared RGB encoder and descriptor decoder which (1) outputs a respective descriptor field of descriptors for the anchor image and each reference image ([0048] feature extraction from images; [0051] detect salient feature patches in one or more captured image frames, “feature” corresponds to “descriptor”), (ii) detects interest points in the anchor image in conjunction with relative poses ([0051] detect salient feature patches in one or more captured image frames) to determine a search space in the reference images from alternate view- points ([0048] feature correspondence between images, inherently, a search space is determined in order to find the correspondence), and (iii) outputs intermediate feature maps ([0048] feature correspondence between images); 
sampling the respective descriptors in the search space of each reference image to determine descriptors in the search space and matching the identified descriptors with descriptors for the interest points in the anchor image, such matched descriptors referred to as matched keypoints ([0048] feature correspondence between images; [0046] The pixel coordinates of points common to cameras in a captured scene; [0077] the "projective space" points, inherently, features in the images are sampled to get the points); 
triangulating the matched keypoints to output 3D points ([0046] The pixel coordinates of points common to cameras in a captured scene may be used along with camera pose information and/or triangulation techniques to obtain per-pixel depth information); 
passing the 3D points through a sparse depth encoder to create a sparse depth image from the 3D points ([0075] a depth image 505 may be received from a depth camera, depth sensor coupled to a color camera, a stereo camera and/or obtained from a depth estimation algorithm) and output feature maps ([0049] use the camera pose and per-pixel depth information to create and/or update a 3D model or representation of the scene; [0050] the 3D model may take the form of a textured 3D mesh, a volumetric data set, a CAD model etc.; [0113] perform 3D reconstruction and/or provide/update 3D models of the scene); and 
a depth decoder generating a dense depth image based on the output feature maps for the sparse depth encoder and the intermediate feature maps from the RGB encoder ([0068] a live color+depth (e.g. RGB-D) image stream (shown as color image (e.g. RGB) 550 and depth image (D) 505 in FIG. 5) may be obtained and used as input for geometry processing module 510, geometry processing module 510 may estimate the camera pose and perform 3D reconstruction; [0087] the filter may be applied to depth image D 505 to fill-in holes and smooth the depth image while respecting boundaries indicated by the intensity gradients in color image 550; [0094] The end result image 780 (FIG. 7E) is a smoothed and filled depth image which better matches the contours in the color image 550).
Karam et al. (US 20150269723 A1) discloses 
triangulating the matched keypoints using singular value decomposition (SVD) to output 3D points ([0070] the coordinates of the 3D point corresponding to the matched 2D points in the images are computed using linear triangulation based on singular value decomposition (SVD)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the inventions of Gruber and Karam, to use singular value decomposition (SVD) for triangulation, in order to extract the most important features in the matched keypoints to obtain the 3D points.

Regarding claims 2 and 10. Gruber discloses The cross reality system of claim 9, wherein the shared RGB encoder and descriptor decoder comprises two encoders including an RGB image encoder ([0048] feature extraction from images; [0051] detect salient feature patches in one or more captured image frames) and a sparse depth image encoder ([0075] a depth image 505 may be received from a depth camera, depth sensor coupled to a color camera, a stereo camera and/or obtained from a depth estimation algorithm), and three decoders including an interest point detection encoder ([0051] detect salient feature patches in one or more captured image frames), a descriptor decoder ([0048] feature correspondence between images), and a dense depth prediction encoder ([0068] a live color+depth (e.g. RGB-D) image stream (shown as color image (e.g. RGB) 550 and depth image (D) 505 in FIG. 5) may be obtained and used as input for geometry processing module 510, geometry processing module 510 may estimate the camera pose and perform 3D reconstruction; [0087] the filter may be applied to depth image D 505 to fill-in holes and smooth the depth image while respecting boundaries indicated by the intensity gradients in color image 550; [0094] The end result image 780 (FIG. 7E) is a smoothed and filled depth image which better matches the contours in the color image 550).

Regarding claims 4 and 12. Gruber discloses The cross reality system of claim 9, wherein the process for estimating depths of features in a scene from a plurality of multi-view images captured by the camera sensors further comprises: 
feeding the feature maps from the RGB encoder into a first task-specific decoder head to determine weights for the detecting of interest points in the anchor image and outputting interest point descriptions ([0088] the result D.sub.P1(p) of a first pass P1 of the filter may be computed using one or more of the following four cases, where w>0 is a weight. the weight w may be varied to favor selection of pixels; [0089] By appropriately selecting the weight w, the filtering process may be used to favor selection of pixels).

	Regarding claims 6 and 14. Official Notice is taken that it is obvious to constrain the search space to a respective epipolar line in the reference images plus a fixed offset on either side of the epipolar line, and within a feasible depth sensing range along the epipolar line, in order to efficiently detect the features in the images.

Claims 3, 5, 11 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber et al. (US 20150262412 A1) in view of Karam et al. (US 20150269723 A1) as applied to claim 9 above, and further in view of Wang et al. (US 10304193 B1).
Regarding claims 3 and 11. Wang discloses a fully-convolutional neural network configured to operate on a full resolution of images (column 1 paragraph 1, computer segmentation and object detection in digital images; column 2 paragraph 1, establishing a fully convolutional neural network. iteratively training the full convolution neural network using the set of training images).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the inventions of Gruber and Karam with the invention of Wang, to use a fully-convolutional neural network for feature/interest-point extraction and feature correspondence between images, in order to extract auto-context features for more accurate and more efficient segmentation and object detection in digital images (Wang column 1 lines 39-45).

Regarding claims 5 and 13. Wang discloses the descriptor decoder comprises a U-Net like architecture to fuse fine and course level image information for matching the identified descriptors with descriptors for the interest points (column 9 line 33; abstract, The various convolutional and deconvolution layers of the neural networks are architected to include a coarse-to-fine residual learning module and learning paths, as well as a dense convolution module to extract auto context features and to facilitate fast, efficient, and accurate training of the neural networks capable of producing prediction masks of regions of interest).
The same motivation has been stated in claim 11.

Claims 7 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber et al. (US 20150262412 A1) in view of Karam et al. (US 20150269723 A1) as applied to claim 9 above, and further in view of CAPENS (US 20150199825 A1).
Regarding claims 7 and 15. CAPENS discloses bilinear sampling (figure 3, [0036]).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the inventions of Gruber and Karam with the invention of CAPENS, to use bilinear sampling for feature/interest-point extraction, in order to better extract the features.

Claims 8 and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Gruber et al. (US 20150262412 A1) in view of Karam et al. (US 20150269723 A1) as applied to claim 9 above, and further in view of Iqbal et al. (US 20190278983 A1).
Regarding claims 8 and 16. Iqbal discloses 
estimating respective two dimensional (2D) positions of the interest points by computing a softmax across spatial axes to output cross-correlation maps (figure 2C unit 205; [0056]-[0057]); 
performing a soft-argmax operation to calculate the 2D position of joints as a center of mass of corresponding cross-correlation maps (figure 2C unit 215; [0056]-[0057]); 
Gruber discloses The cross reality system of claim 9, wherein the step of triangulating the matched keypoints comprises:
performing a linear algebraic triangulation from the 2D estimates ([0046] The pixel coordinates of points common to cameras in a captured scene may be used along with camera pose information and/or triangulation techniques to obtain per-pixel depth information); and 
Karam discloses 
using a singular value decomposition (SVD) to output 3D points ([0070] the coordinates of the 3D point corresponding to the matched 2D points in the images are computed using linear triangulation based on singular value decomposition (SVD)).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the inventions of Gruber and Karam with the invention of Iqbal, to use a neural network architecture to extract feature/interest-point and generate feature correspondence between images (Iqbal [0025]) including performing the softmax and soft-argmax operations (Iqbal [0057]), in order to better extract the features.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to XIAOLAN XU whose telephone number is (571)270-7580. The examiner can normally be reached Mon. 8:30-4:30; Thurs. 8:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, SATH V. PERUNGAVOOR can be reached on (571) 272-7455. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/XIAOLAN XU/               Examiner, Art Unit 2488