Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.

DETAILED ACTION
Allowable Subject Matter
Claim 2 and 8 and dependent claims 3-6 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The prior art of record fails to teach or suggest configuring the computing device to execute a multipoint algorithm to obtain a first fundamental matrix according to the plurality of sets of corresponding feature point coordinates, wherein the first fundamental 109P001851US30matrix is used to define an epipolar geometry relationship between the captured image and the most similar image;
configuring the computing device to calculate a first essential matrix between the captured image and the most similar image according to the first fundamental matrix, and extract a rotation matrix and a movement vector matrix in the first fundamental matrix by using the first essential matrix;
configuring the computing device to calculate a second essential matrix between the most similar image and the nearest image, and inversely calculate a scale ratio from the second essential matrix, the most similar image, and the plurality of camera positions and the plurality of camera pose parameters corresponding to the nearest image; and
configuring the computing device to multiply the scale ratio by the movement vector matrix, so as to obtain the capturing position and the capturing pose parameter of the image capturing device upon obtaining the captured image in the context of claim 2.

The prior art of record fails to teach or suggest wherein a number of iterations in the VGG deep learning network is 5, and 
a fourth pooling layer of the plurality of pooling layers is used as the main feature extraction layer in the context of claim 8.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee (US PGPUB 20210232858) in view of Laskar et al. (NPL “Camera relocalization by computing pairwise relative poses using convolutional neural network”) and in further view of Shree et al. (US PGPUB 20220284609).
As per claim 1, Mukherjee discloses an indoor positioning method (Mukherjee, abstract), comprising:
configuring a computing device to obtain a 3D model of a target area (Mukherjee, para. 5 and 71, where a 3D model corresponding to an object is obtained);
configuring the computing device to generate at least one virtual camera, control the at least one virtual camera to obtain a plurality of virtual images in the 3D model, and a plurality of camera positions and a plurality of camera pose parameters corresponding to the plurality of virtual images, and store the plurality of virtual images, the plurality of camera positions and the plurality of camera pose parameters in an image database (Mukherjee, Fig. 5A, where images of a 3D model are acquired by rendering the 3D model at various poses, and where each generated image of the 3D model is stored with the respective pose in memory);
configuring the computing device to input the plurality of virtual images into a trained deep learning network, so as to perform image feature extractions on the plurality of virtual images and to obtain a plurality of virtual image features corresponding to the plurality of virtual images (Mukherjee, para. 95, where the synthetic image data is generated in order to train a deep neural network which will be used to detect the object in the real world);
configuring an image capturing device to obtain a captured image at a current position in the target area (Mukherjee, para. 121-122, where an image of an object is captured);
configuring the computing device to input the captured image into the trained deep learning network, so as to perform the image feature extraction on the captured image and to obtain a captured image feature corresponding to the captured image (Mukherjee, para. 121-122, where the captured image is fed into a deep neural network in order to determine the pose of the object); and
configuring the computing device to execute a similarity matching algorithm on the captured image feature and the plurality of virtual image features, so as to obtain a plurality of matching virtual images with relatively high similarity to the captured image from the plurality of virtual 109P001851US29images (Mukherjee, para. 121-122, where the captured image is compared with images inside the neural network in order to determine similarity with images in the neural network; this similarity is determined using edge features).
Mukherjee discloses determining a 6DOF pose of a virtual object using a neural network trained with synthetic images rendered from a corresponding 3D model taken at various viewpoints where features of the input image is matched with other images in the neural network.  Mukherjee doesn’t disclose retrieving multiple similar images from the neural network, and determining capturing position and pose parameters according to a geometric relationship between the captured image and those other images and sets of corresponding feature point coordinates.  However Laskar discloses obtain, from the plurality of virtual images, a nearest image having the virtual image feature with the highest similarity to the virtual image feature of the most similar image (Laskar, Section 3.2, where database images are ranked according to similarity to the query image, and the top N similar images are retrieved as nearest neighbors);
Mukherjee in view of Laskar doesn’t disclose obtaining images of a building nor having the user select corresponding feature points between two images.  However Laskar discloses
obtain, from the plurality of virtual images, a nearest image having the virtual image feature with the highest similarity to the virtual image feature of the most similar image (Laskar, Section 3.2, where database images are ranked according to similarity to the query image, and the top N similar images are retrieved as nearest neighbors);
calculate a capturing position and a capturing pose parameter of the image capturing device upon obtaining the captured image according to a geometric relationship between the captured image and the most similar image, a geometric relationship between the most similar image and the nearest image, and the plurality of sets of corresponding feature point coordinates (Laskar, Figure 1 and Section 3.2, where the relative pose of the query image is determined based on the geometric relationship between its features and those of the N nearest query images; the use of “db descriptors” maps to image features; if N = 2 (the number of similar images returned by the neural network is 2) then the one ranked second would map to the nearest image and the first-ranked one would be the most similar image); and
take the capturing position as a positioning result representing the current position (Laskar, Figure 1, abstract and Section 1, where the camera location is determined from the algorithm).
Mukherjee and Laskar are analogous since both of them are dealing with the determination of camera or object pose information using neural networks. Mukherjee provides a way of determining a 6DOF pose of a virtual object using a neural network trained with synthetic images rendered from a corresponding 3D model taken at various viewpoints where features of the input image is matched with other images in the neural network. Laskar provides a way of using a neural network to return N similar images to an input image using a neural network, then using geometric relationships between their features to determine a camera pose. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the determination of multiple similar images to an input image for determining a camera pose taught by Laskar into the modified invention of Mukherjee such that the system will be able to not require scene-specific training and be applied to scenes that are not available during the training of the network (Laskar, abstract).
Mukherjee in view of Laskar doesn’t disclose presenting a pair of images to a user for manual selection of corresponding feature points.  However Shree discloses configuring the computing device to display the captured image and the most similar image on the user interface for the user to select a plurality of sets of corresponding feature points from the captured image and the most similar image (Shree, Figs. 7A-7B and para. 92-94, where matching feature points between the two images can be selected by the user; and where multiple matching feature points can be selected); and . obtain a plurality of sets of corresponding feature point coordinates of the plurality of sets of feature points (Shree, Figs. 6A-7B).
Mukherjee in view of Laskar and Shree are analogous since both of them are dealing with the analysis of images using corresponding feature points in images. Mukherjee in view of Laskar provides a way of using neural networks to determine a camera pose or object pose from an image. Shree provides a way of presenting two images to a user for the purpose of manually selecting corresponding feature points. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate allowing a user to set corresponding feature points between images taught by Shree into the modified invention of Mukherjee in view of Laskar such that the system will be able to provide a way to edit a photographic or photo-realistic image that isn’t dependent on a fixed trained input (Shree, [0019]-[0020]).

As per claim 10, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the system, which are disclosed by Mukherjee in the abstract), thus they are rejected on similar grounds.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee (US PGPUB 20210232858) in view of Laskar et al. (NPL “Camera relocalization by computing pairwise relative poses using convolutional neural network”) and in further view of Shree et al. (US PGPUB 20220284609) as applied to claim 1 above, and in further view of Shavit (NPL “Do We Really Need Scene-specific Pose Encoders?”).
As per claim 7, claim 1 is incorporated and Mukherjee doesn’t disclose but Laskar discloses wherein the step of performing the image feature extractions on the plurality of virtual images to obtain the plurality of virtual image features corresponding to the plurality of virtual images further includes using one of the plurality of pooling layers as a main feature extraction layer to perform the image feature extractions on the plurality of virtual images (Laskar, Figure 1, where the “Trained branch” is used to compute the feature representations of the database and the query images).
Mukherjee in view of Laskar and Shree doesn’t disclose but Shavit discloses wherein the trained deep learning network is a VGG deep learning network pre-trained by an ImageNet data set (Shavit, abstract, where the pose encoder is trained using a “generic image retrieval model” which maps to ImageNet; and Section III. A., “we apply a NetVLAD model with a VGG16 backbone, pretrained on the Pittsburgh 250K dataset”), and 
the VGG deep learning network includes a plurality of convolutional layers and a plurality of pooling layers that are 109P001851US33sequentially iterated for multiple times, a fully connected layer and a normalization function  (Shavit, Figure 3, where the deep learning network contains multiple convolutional layers and pooling layers; and Section III. A., where the images are normalized using a mean and standard deviation).
Mukherjee in view of Laskar and Shree and Shavit are analogous since both of them are dealing with the use of neural networks in order to determine camera pose of an input image. Mukherjee in view of Laskar and Shree provides a way of using neural networks to determine a camera pose or object pose from an image. Shavit provides a way of determining a camera pose of an image using a VGG deep-learning network. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the use of a VGG network taught by Shavit into the modified invention of Mukherjee in view of Laskar and Shree such that the system will be able to perform pose regression without the need for scene-specific pose encoders (Shavit, abstract and Introduction).

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee (US PGPUB 20210232858) in view of Laskar et al. (NPL “Camera relocalization by computing pairwise relative poses using convolutional neural network”) and in further view of Shree et al. (US PGPUB 20220284609) as applied to claim 1 above, and in further view of Lipkowitz et al. (US PGPUB 20180330393).
As per claim 9, claim 1 is incorporated and Mukherjee in view of Laskar and Shree doesn’t disclose but Lipkowitz discloses wherein the similarity matching algorithm further includes using a cosine similarity matching algorithm to calculate a plurality of similarities of the plurality of virtual images that correspond to the captured image (Lipkowitz, para. 56, where the feature vector comparison step is done using cosine similarity).  
Mukherjee in view of Laskar and Shree and Lipkowitz are analogous since both of them are dealing with the determination of similarities between images. Mukherjee in view of Laskar and Shree provides a way of using neural networks to determine a camera pose or object pose from an image. Lipkowitz provides a way of determining a degree of similarity between images using a “cosine similarity” algorithm.  Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention was made to incorporate the use of “cosine similarity” taught by Lipkowitz into the modified invention of Mukherjee in view of Laskar and Shree such that the system will be able to speed up the classification process (Lipkowitz, para. 56).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Cier et al. (US PGPUB 20220224833) discloses using a neural network to estimate room shape geometry from within a room of a building.  It generates image-to-image similarities between a target image and existing images.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DIANE M WILLS whose telephone number is (571)272-5583. The examiner can normally be reached on Mondays through Fridays from 9am to 6pm Eastern time.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang, can be reached at telephone number 571-272-7667. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
/DIANE M WILLS/Primary Examiner, Art Unit 2619