DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
Applicant's submission filed 4/18/2020 have been entered. The claims 1, 3, 6-10 and 17 have been amended. The claims 1-20 are pending in the current application. 

Response to Arguments
Applicant's arguments filed 4/18/2022 have been fully considered but they are not persuasive. 
In Pages 11-12 of Remarks, applicant highlighted the claimed computing and selecting steps and alleged that Schonberger does not teach the claim limitations of the amended claim 17. The examiner cannot concur. 
Schonberger teaches the claim limitation of computing quality metrics for the plurality of pairs of matched features, wherein the quality metrics for the plurality of pairs of matched features are computed after the plurality of pairs of matched features are identified (Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined. 
Schonberger teaches at Paragraph 0070-0073 that the preliminary estimated pose may have an associated confidence value proportional to a quantity of image features transmitted to the remote device). 
Schonberger teaches the claim limitation of 
selecting subsets of matched features based on the quality metrics such that matched features having higher quality metrics are more likely to be selected in the subset than those with lower quality metrics (Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined. 
Schonberger teaches at Paragraph 0061-0066 the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences….a plurality of subsets of 2D data point pairs may be selected…..a RANSAC solver may output a finite number of solutions or potential pose candidates from a subset of identified correspondences….the remote device may identify a best overall camera device pose from the set of every candidate camera device pose. Schonberger teaches at Paragraph 0062 that the solution is compared to other 2D data points in the dataset to determine which points are consistent with the proposed solution which points are not consistent with the proposed solution. Any solution that has the highest inlier to outlier ratio may be accepted as the actual solution. Thus, the subset of 2D data points corresponding to the actual solution has the highest inlier to outlier ratio and corresponds to the highest quality metrics. 
Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3D map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity). 
In Pages 9-11 of Remarks, applicant argues in essence with respect to the amended claim 10 and similar claims in light of the new claim limitations as highlighted in the argument. The examiner cannot concur.  
However, Schonberger teaches the claim limitation that each pair of matched features comprising: a received feature comprising a first vector different from the descriptor of the received feature and indicating a position of the received feature in a device coordinate frame of the portable electronic device (
Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map.  At least a row of 2D pixel positions of image features includes a first vector of 2D pixel positions for the first row of pixels in the image), and 
a persistent feature comprising a second vector different from the descriptor of the persisted feature and indicating a position of the persisted feature in a canonical coordinate frame of the persisted maps (Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map. It is noted that at least a collection of 3D map positions having the identified correspondences with the first row of the pixels in the image forms a second vector of 3D map positions). 
In Pages 7-9 of Remarks, applicant argues in essence with respect to the amended claim 1 and similar claims in light of the new claim limitations set forth in the amended claim 1. Applicant alleged that Schonberger does not teach a local map. However, Schonberger teaches at FIG. 5 and Paragraph 0038-0043 that information 500 (local map) includes an image 502 of a real-world environment and includes a set of image features 504 corresponding to image 502…each image feature includes a descriptor 506 as well as a 2D pixel position 508 at which the image feature was detected and information 500 also includes camera info 510…the information may include some information indicating a spatial relationship of each of the different respective cameras of the camera device relative to one another…..the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. Accordingly, information 500 defines a local map of the camera device. 
Secondly, Schonberger teaches the claim limitation: 
for each feature in at least a subset of the set of extracted features, sending information representing the feature over a network to a localization service (
Schonberger teaches at Paragraph 0060 that a minimal solver may be used to calculate a plurality of candidate camera device poses from subsets of determined correspondences. Schonberger teaches at Paragraph 0061 that the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences. 

Schonberger teaches at Paragraph 0024 communication between the camera device and remote device may be achieved such that the image-based localization can be performed on a remote device and at Paragraph 0033 transmitting the first set of image features to a remote device and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images and at Paragraph 0049 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment); the information representing the feature comprising: 
A descriptor for the feature (Schonberger teaches at Paragraph 0052 that each of the map features are associated feature descriptor extracted from the source images and at Paragraph 0055 this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud), 
2D information indicating the position of the feature in the device coordinate frame, wherein the 2D information is configured for computing a quality metric indicating a likelihood of the feature matching a 3D feature having a location with respect to a second coordinate frame (Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. 
Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined). 
Applicant’s arguments with respect to newly amended claim 1 have been considered but are moot because the new ground of rejection based on the newly cited Liu reference.
Applicant also argues in essence with respect to the new claim limitation that the descriptor comprises a numeric value configured to enable matching the feature to a similar feature. However, Liu teaches the new claim limitation. 
Schonberger teaches at Paragraph 0032 that the image features may be SIFT features. …which is implemented as a 128-dimensional vector. It is known that the 128-dimensional vector includes a numeric index value as shown in Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu) at Paragraph 0049 that a 128-dimensional SIFT feature vector can be quantified as a feature ID (e.g., a number from 1-1,000,000. That is each SIFT feature vector representing a feature of the training image can have a corresponding numeric feature ID. It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have assigned feature ID to each of the SIFT features. One of the ordinary skill in the art would have identified each SIFT feature by the feature ID to lookup the corresponding item from the database to determine the correspondences.  


Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 10-16 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 
The claim 10 recites “the first vectors and second vectors” at line 19 of the claim 10. It lacks antecedent basis in the claim 10. The recitation failed to refer to “a first vector” and “a second vector” in the claim 10. The claims 11-16 are rejected due to their dependency on the claim 10. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 2, 4-7 and 9 are rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of 
Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu); 
Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date) and Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright). 
Re Claim 1: 
Schonberger teaches an electronic device configured to operate within a cross reality system, the electronic device having a device coordinate frame, the electronic device comprising: 
A plurality of sensors configured to capture information about a three-dimensional (3D) environment, the captured information comprising a plurality of 2D images (
Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0025-0027 that the camera device 200 configured to use one or more cameras 302 to capture a plurality of images 304A and 304B…..the plurality of images captured by the camera device may be captured substantially at once…in a device with four on-board cameras.  
Schonberger teaches at Paragraph 0003 and Paragraph 0012 that a camera device capturing several different images); and 
at least one processor configured to execute computer executable instructions, wherein the computer executable instructions comprise instructions for (Schonberger Paragraph 0081 “the logic subsystem may include one or more hardware processors configured to execute software instructions”): 
extracting a plurality of features from one or more of the plurality of images of the 3D environment (Schonberger teaches at Paragraph 0030-0032 that method 200 includes detecting a first set of image features in a first image of the plurality of images…..the image features may be SIFT features. 
Schonberger teaches at Paragraph 0021 that such image features may be detected in the image and at Paragraph 0030 detecting a first set of image features in a first image of the plurality of images); 
generating a local map in the device coordinate frame based at least in part on a set of features of the extracted plurality of features (Schonberger teaches at FIG. 5 and Paragraph 0038-0043 that information 500 (local map) includes an image 502 of a real-world environment and includes a set of image features 504 corresponding to image 502…each image feature includes a descriptor 506 as well as a 2D pixel position 508 at which the image feature was detected and information 500 also includes camera info 510…the information may include some information indicating a spatial relationship of each of the different respective cameras of the camera device relative to one another…..the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. Accordingly, information 500 defines a local map of the camera device. 
Schonberger teaches at FIG. 5 a local map constructed from the image features 504 including descriptor 506 and pixel position 508. teaches at FIG. 1B that the local map in the camera coordinate system includes the images 110A-110C of the real world environment 102 including a set of image features such as the black circles 112 and structure 104); 
for each feature in at least a subset of the set of extracted features, sending information representing the feature over a network to a localization service (
Schonberger teaches at Paragraph 0060 that a minimal solver may be used to calculate a plurality of candidate camera device poses from subsets of determined correspondences. Schonberger teaches at Paragraph 0061 that the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences. 

Schonberger teaches at Paragraph 0024 communication between the camera device and remote device may be achieved such that the image-based localization can be performed on a remote device and at Paragraph 0033 transmitting the first set of image features to a remote device and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images and at Paragraph 0049 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment); the information representing the feature comprising: 
A descriptor for the feature (Schonberger teaches at Paragraph 0052 that each of the map features are associated feature descriptor extracted from the source images and at Paragraph 0055 this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud), [the descriptor comprises a numeric value configured to enable matching the feature to a similar feature] (Schonberger at least suggests the claim limitation. Schonberger teaches at Paragraph 0032 that the image features may be SIFT features. …which is implemented as a 128-dimensional vector. It is known that the 128-dimensional vector includes a numeric index value as shown in Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu) at Paragraph 0049 that a 128-dimensional SIFT feature vector can be quantified as a feature ID (e.g., a number from 1-1,000,000. That is each SIFT feature vector representing a feature of the training image can have a corresponding numeric feature ID. It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have assigned feature ID to each of the SIFT features. One of the ordinary skill in the art would have identified each SIFT feature by the feature ID to lookup the corresponding item from the database to determine the correspondences).  
2D information indicating the position of the feature in the device coordinate frame, wherein the 2D information is configured for computing a quality metric indicating a likelihood of the feature matching a 3D feature having a location with respect to a second coordinate frame (Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. 
Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined). 
Schonberger at least suggests the claim limitation: 
receiving from the localization service at least one transformation relating the device coordinate frame to the second coordinate frame (
Schonberger teaches at Paragraph 0061 that the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences. 
Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined. 
It is understood that the localization parameter such as the pose of the camera specifies the geometric relationship (mapping transformation) of the device relative to the environment as taught in Wright Paragraph 0179. Applying a pose of the camera includes providing a transformation between the environment map coordinate system and the camera coordinate system. Therefore, receiving the pose of the camera allows for the geometric transformation of the device coordinate frame to the 3D environment map coordinate frame. 
Schonberger teaches receiving localization parameter such as a camera pose specifying a geometric relationship (the coordinate transformation matrix) between the camera coordinate frame and the environment map coordinate frame. Schonberger teaches at Paragraph 0049-0055 identifying correspondence (transformation) between detected image features and map features of the digital environment map…estimating the pose of a camera device may involve determining correspondences (the coordinate transformation matrix) between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map and at Paragraph 0012 that matching 2D point features in the captured images to 3D map features stored in the 3D map as 3D points and the 6DOF pose of the camera may be computed using the 3D point to 3D point matches and their underlying coordinates).   
Xu teaches at Paragraph 0150 that the coordinate transformation matrix is obtained from the localization parameters. 
Xu’s localization parameters defining the coordinate transformation matrix when applied to Wright allows for Wright’s localization parameters to be distributed to the client device from the server as a localization service to have defined the coordinate transformation matrix. Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 

In view of Xu, Wright/Dine/Beith teaches the claim limitation: receiving from the localization service at least one transformation relating the device coordinate frame to a second coordinate frame (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation.  
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a position on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have received the camera’s pose in order to transform the 2D feature points of the virtual object to the matched 3D feature points in the environment map so as to register the virtual object with respect to the real object in the 3D environment to have transformed the virtual object in an extended reality environment. One of the ordinary skill in the art would have been motivated to have transformed the virtual object in the extended reality environment. 

Re Claim 2: 
The claim 2 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the electronic device comprises a display; and the computer-executable instructions comprise instructions for rendering virtual content having a location specified in the second coordinate frame on the display in a position computed based, at least in part, on a transformation of the at least one transformation.
Wright/Beith teaches the claim limitation that the electronic device comprises a display; and the computer-executable instructions comprise instructions for rendering virtual content having a location specified in the second coordinate frame on the display in a position computed based, at least in part, on a transformation of the at least one transformation (Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Beith teaches at Paragraph 0034-0037 that the pose of the AR device 102 can be determined and/or tracked by the processor based on images captured by the camera 108  and the 6DOF SLAM can associate features observed from certain input images from the camera 108 to the SLAM map….The pose of the camera 108 and/or the AR device 102 can be determined by projecting features from the 3D SLAM map into an image or video frame and updating the camera pose from verified 2D-3D correspondences…AR objects can be registered to the detected feature points in a scene. 
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a positon on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display and at Paragraph 0053 that the landmark points on the palms of the hands 226 can be detected in an image and the locations of the landmark points can be determined with respect to the camera of the AR device 202 and at Paragraph 0055 that the hose pose can be used as another reference point from which to register the virtual keyboard 228). 
Re Claim 4: 
The claim 4 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the plurality of features are extracted from a plurality of images captured by at least two sensors of the electronic device.
However, Schonberger further teaches the claim limitation that the plurality of features are extracted from a plurality of images captured by at least two sensors of the electronic device (Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence).
Re Claim 5: 
The claim 5 encompasses the same scope of invention as that of the claim 4 except additional claim limitation that each of the at least two sensors is associated with a respective sensor coordinate frame; and the computer executable instructions comprise further instructions for translating the features extracted from the plurality of images from a respective sensor coordinate frame to the device coordinate frame. 
However, Schonberger further teaches the claim limitation that each of the at least two sensors is associated with a respective sensor coordinate frame; and the computer executable instructions comprise further instructions for translating the features extracted from the plurality of images from a respective sensor coordinate frame to the device coordinate frame (Schonberger teaches at Paragraph 0043 that the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. 
Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence).
Re Claim 6: 
The claim 6 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the plurality of sensors have respective sensor coordinate frames, and the computer-executable instructions comprise instructions for computing the sensor coordinate frames based on locations of the plurality of sensors on the electronic device. 
However, Schonberger and Wright further teach the claim limitation that the plurality of sensors have respective sensor coordinate frames, and the computer-executable instructions comprise instructions for computing the sensor coordinate frames based on locations of the plurality of sensors on the electronic device ( 
Wright teaches at Paragraph 0220-0222 that every next camera frame is easy to compute because now you can find already known feature points in the next frame where you know 3D positions in space so you can figure out by triangulation where the new camera frame is and from this new camera frame position you see new feature points…but you now are able to triangulate to have new ones that have just the 2D position at the moment. 
Schonberger teaches at Paragraph 0043 that the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. 
Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence. 
Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map.  At least a row of 2D pixel positions of image features includes a first vector of 2D pixel positions for the first row of pixels in the image. 
).

Re Claim 7: 
The claim 7 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the electronic device comprises a display; and the computer-executable instructions comprise instructions for computing the sensor coordinate frames based on locations of the plurality of sensors with respect to the display.
However, Schonberger further teaches the claim limitation that the electronic device comprises a display; and the computer-executable instructions comprise instructions for computing the sensor coordinate frames based on locations of the plurality of sensors with respect to the display (Schonberger teaches at Paragraph 0043 that the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. 
Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence).
Re Claim 9: 
The claim 9 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the 2D information for each of at least two of the plurality of sensors comprises a first vector indicating the position of the feature in a sensor coordinate frame of the sensor that captured the image comprising the feature, and a second vector indicating the position in the device coordinate frame of the sensor that captured the image comprising the feature. 
Wright and Schonberger further teach the claim limitation that the 2D information for each of at least two of the plurality of sensors comprises a first vector indicating the position of the feature in a sensor coordinate frame of the sensor that captured the image comprising the feature, and a second vector indicating the position in the device coordinate frame of the sensor that captured the image comprising the feature (
Wright teaches at Paragraph 0220-0222 that every next camera frame is easy to compute because now you can find already known feature points in the next frame where you know 3D positions in space so you can figure out by triangulation where the new camera frame is and from this new camera frame position you see new feature points…but you now are able to triangulate to have new ones that have just the 2D position at the moment. 
Schonberger teaches at Paragraph 0043 that the camera device may track its own pose relative to some internal frame of reference and transmit such information to the remote device. 
Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence. 
Schonberger teaches at Paragraph 0052 that a feature descriptor may comprise a multi-dimensional vector and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0033 additional sets of image features are detected in other images of the plurality of images captured by the multiple cameras of the camera device and at Paragraph 0052 that each of the map features are associated with feature descriptors extracted from the source images. 
Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3D map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity. 
Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map.  At least a row of 2D pixel positions of image features includes a first vector of 2D pixel positions for the first row of pixels in the image. 
). 

Claims 10-14 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of 
Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu); 
Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date) and Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright). 

Re Claim 10: 
Schonberger teaches an XR system that supports specification of a position of virtual content relative to persisted maps in a database of persisted maps, the system comprising: 
a communication component configured to receive from a portable electronic device information about a set of features in a three-dimensional (3D) environment of the portable electronic device (Schonberger teaches at Paragraph 0024 communication between the camera device and remote device may be achieved such that the image-based localization can be performed on a remote device and at Paragraph 0033 transmitting the first set of image features to a remote device and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images and at Paragraph 0049 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment); and 
a localization component, connected to the communication component, the localization component configured to: 
match the set of received features against persisted features in the database of persisted maps to provide pairs of matched features based on similarity of descriptors of the received features and descriptors of the persisted features, each pair of matched features comprising 
(Schonberger teaches at Paragraph 0049 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment. 
Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity), 
a received feature comprising a first vector different from the descriptor of the received feature and indicating a position of the received feature in a device coordinate frame of the portable electronic device (Schonberger teaches at Paragraph 0032 that the image features may be SIFT features…a SIFT feature includes SIFT key-point, which stores geometric information relevant to the feature, including the 2D position….an image feature will include some description of the underlying image data with which the image feature is associated and an indication of a 2D pixel position at which the image feature was identified. Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map.  At least a row of 2D pixel positions of image features includes a first vector of 2D pixel positions for the first row of pixels in the image), and 
A persistent feature comprising a second vector different from the descriptor of the persisted feature and indicating a position of the persisted feature in a canonical coordinate frame of the persisted maps (Schonberger teaches at Paragraph 0055 that 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map. It is noted that at least a collection of 3D map positions having the identified correspondences with the first row of the pixels in the image forms a second vector of 3D map positions), 
compute quality metrics for the pairs of matched features based, at least in part, on the first vectors and second vectors, the quality metric for each pair of matched features indicating the likelihood that the matched features represent the same feature in the 3D environment (Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3D map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity. 
Schonberger teaches at Paragraph 0070-0073 that the preliminary estimated pose may have an associated confidence value proportional to a quantity of image features transmitted to the remote device). 
Schonberger at least suggests the claim limitation: 
generate a transformation between the device coordinate frame of the portable electronic device and the canonical coordinate frame of the persisted maps based on the matched correspondences and the computed quality metrics for the matched correspondences (
Schonberger teaches at Paragraph 0055-0056 that determining (mapping) correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined and at Paragraph 0089 that the identified correspondences are between 2D pixel positions of image features transmitted to the remote device and 3D map positions of map features in the digital environment map. 
It is noted that the mapping correspondences between the 2D pixel positions of images and 3D map positions requires a projection transformation of the 3D map positions to the 2D pixel positions. 
It is understood that the pose of the camera specifies the geometric relationship (mapping transformation) of the device relative to the environment as taught in Wright Paragraph 0179. Applying a pose of the camera includes providing a transformation between the environment map coordinate system and the camera coordinate system. Therefore, receiving the pose of the camera is the same as receiving the geometric transformation of the device coordinate frame to the 3D environment map coordinate frame. 
Schonberger teaches receiving a camera pose specifying a geometric relationship between the camera coordinate frame and the environment map coordinate frame. Schonberger teaches at Paragraph 0049-0055 identifying correspondence (transformation) between detected image features and map features of the digital environment map…estimating the pose of a camera device may involve determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map and at Paragraph 0012 that matching 2D point features in the captured images to 3D map features stored in the 3D map as 3D points and the 6DOF pose of the camera may be computed using the 3D point to 3D point matches and their underlying coordinates). 
Xu teaches at Paragraph 0150 that the coordinate transformation matrix is obtained from the localization parameters. 
Xu’s localization parameters defining the coordinate transformation matrix when applied to Wright allows for Wright’s localization parameters to be distributed to the client device from the server as a localization service to have defined the coordinate transformation matrix. Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
In view of Xu, Wright/Dine/Beith teaches the claim limitation: generate a transformation between the device coordinate frame of the portable electronic device and the canonical coordinate frame of the persisted maps based on the matched correspondences and the computed quality metrics for the matched correspondences (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation.  
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a positon on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have received the camera’s pose in order to transform the 2D feature points of the virtual object to the matched 3D feature points in the environment map so as to register the virtual object with respect to the real object in the 3D environment to have transformed the virtual object in an extended reality environment. One of the ordinary skill in the art would have been motivated to have transformed the virtual object in the extended reality environment. 

Re Claim 11: 
The claim 11 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the localization component is further configured to: send the transformation to the portable electronic device. 
Schonberger at least suggests the claim limitation that the localization component is further configured to: send the transformation to the portable electronic device (Schonberger teaches at Paragraph 0073 that the final estimated pose may then be sent to the camera device at 434 and received at the camera device at 436. 
It is understood that the pose of the camera specifies the geometric relationship (mapping transformation) of the device relative to the environment as taught in Wright Paragraph 0179. Applying a pose of the camera includes providing a transformation between the environment map coordinate system and the camera coordinate system. Therefore, receiving the pose of the camera is the same as receiving the geometric transformation of the device coordinate frame to the 3D environment map coordinate frame. 
Schonberger teaches receiving a camera pose specifying a geometric relationship between the camera coordinate frame and the environment map coordinate frame. Schonberger teaches at Paragraph 0049-0055 identifying correspondence (transformation) between detected image features and map features of the digital environment map…estimating the pose of a camera device may involve determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map and at Paragraph 0012 that matching 2D point features in the captured images to 3D map features stored in the 3D map as 3D points and the 6DOF pose of the camera may be computed using the 3D point to 3D point matches and their underlying coordinates).  
Xu teaches at Paragraph 0150 that the coordinate transformation matrix is obtained from the localization parameters. 
Xu’s localization parameters defining the coordinate transformation matrix when applied to Wright allows for Wright’s localization parameters to be distributed to the client device from the server as a localization service to have defined the coordinate transformation matrix. Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
In view of Xu, Wright/Dine/Beith teaches the claim limitation: that the localization component is further configured to: send the transformation to the portable electronic device (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation.  
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a positon on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have received the camera’s pose in order to transform the 2D feature points of the virtual object to the matched 3D feature points in the environment map so as to register the virtual object with respect to the real object in the 3D environment to have transformed the virtual object in an extended reality environment. One of the ordinary skill in the art would have been motivated to have transformed the virtual object in the extended reality environment. 

Re Claim 12: 
The claim 12 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the communication component is further configured to receive from the portable electronic device positioning information for the features of the set of features expressed in respective sensor coordinate frames of the sensors that captured the images comprising the set of features. 
Schonberger further teaches the claim limitation that the communication component is further configured to receive from the portable electronic device positioning information for the features of the set of features expressed in respective sensor coordinate frames of the sensors that captured the images comprising the set of features (Schonberger teaches at Paragraph 0024 communication between the camera device and remote device may be achieved such that the image-based localization can be performed on a remote device and at Paragraph 0033 transmitting the first set of image features to a remote device and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images and at Paragraph 0049 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment). 
Re Claim 13: 
The claim 13 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the localization component is configured to compute positioning information for the features of the set of features expressed in respective sensor coordinate frames of the sensors that captured the images comprising the set of features.
Wright/Dine teaches the claim limitation: that the localization component is configured to compute positioning information for the features of the set of features expressed in respective sensor coordinate frames of the sensors that captured the images comprising the set of features (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation). 
Re Claim 14: 
The claim 14 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the localization component comprises a pose estimation component configured to generate the transformation between the device coordinate frame of the portable electronic device and the canonical coordinate frame of the persisted maps.
Schonberger at least suggests the claim limitation: 
that the localization component comprises a pose estimation component configured to generate the transformation between the device coordinate frame of the portable electronic device and the canonical coordinate frame of the persisted maps (It is understood that the pose of the camera specifies the geometric relationship (mapping transformation) of the device relative to the environment as taught in Wright Paragraph 0179. Applying a pose of the camera includes providing a transformation between the environment map coordinate system and the camera coordinate system. Therefore, receiving the pose of the camera is the same as receiving the geometric transformation of the device coordinate frame to the 3D environment map coordinate frame. 
Schonberger teaches receiving a camera pose specifying a geometric relationship between the camera coordinate frame and the environment map coordinate frame. Schonberger teaches at Paragraph 0049-0055 identifying correspondence (transformation) between detected image features and map features of the digital environment map…estimating the pose of a camera device may involve determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map and at Paragraph 0012 that matching 2D point features in the captured images to 3D map features stored in the 3D map as 3D points and the 6DOF pose of the camera may be computed using the 3D point to 3D point matches and their underlying coordinates). 
Wright/Dine/Beith teaches the claim limitation: that the localization component comprises a pose estimation component configured to generate the transformation between the device coordinate frame of the portable electronic device and the canonical coordinate frame of the persisted maps (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation.  
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a positon on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have received the camera’s pose in order to transform the 2D feature points of the virtual object to the matched 3D feature points in the environment map so as to register the virtual object with respect to the real object in the 3D environment to have transformed the virtual object in an extended reality environment. One of the ordinary skill in the art would have been motivated to have transformed the virtual object in the extended reality environment. 

Re Claim 16: 
The claim 16 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the communication component is further configured to receive positioning information in a device coordinate frame for sensors that captured images comprising the set of features.
Wright/Dine teaches the claim limitation: that the communication component is further configured to receive positioning information in a device coordinate frame for sensors that captured images comprising the set of features (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation). 


Re Claim 17: 
Schonberger teaches a method of computing a pose between a first set of features, derived from at least one image collected on a portable electronic device and a second set of features in a stored map (Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity), the method comprising: 
computing descriptors for the features of the first set (
Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3map); 
identifying a plurality of pairs of matched features based on similarity of the computed descriptors for the first set and descriptors for the features of the second set (Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity); 
computing quality metrics for the plurality of pairs of matched features, wherein the quality metrics for the plurality of pairs of matched features are computed after the plurality of pairs of matched features are identified (Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined. 
Schonberger teaches at Paragraph 0070-0073 that the preliminary estimated pose may have an associated confidence value proportional to a quantity of image features transmitted to the remote device); 
selecting subsets of matched features based on the quality metrics such that matched features having higher quality metrics are more likely to be selected in the subset than those with lower quality metrics (Schonberger teaches at Paragraph 0049-0056 that the remote device matches image features detected in the plurality of images of the real-world environment to one or more map features in a digital environment map representing the real-world environment…the digital environment map may include any suitable number of points, a 3D point cloud may include 20,000 or more 3D points. Schonberger teaches at Paragraph 0055-0056 that determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map….not all of the determined correspondences are necessarily correct…some number of incorrect correspondences may be determined. 
Schonberger teaches at Paragraph 0061-0066 the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences….a plurality of subsets of 2D data point pairs may be selected…..a RANSAC solver may output a finite number of solutions or potential pose candidates from a subset of identified correspondences….the remote device may identify a best overall camera device pose from the set of every candidate camera device pose. Schonberger teaches at Paragraph 0062 that the solution is compared to other 2D data points in the dataset to determine which points are consistent with the proposed solution which points are not consistent with the proposed solution. Any solution that has the highest inlier to outlier ratio may be accepted as the actual solution. Thus, the subset of 2D data points corresponding to the actual solution has the highest inlier to outlier ratio and corresponds to the highest quality metrics. 
Schonberger teaches at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3D map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity); 
determining a relative pose of the features of the first set included in the subset and features of the second set included in the subset (
Schonberger teaches at Paragraph 0061-0066 the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences….a plurality of subsets of 2D data point pairs may be selected…..a RANSAC solver may output a finite number of solutions or potential pose candidates from a subset of identified correspondences….the remote device may identify a best overall camera device pose from the set of every candidate camera device pose); 
transforming at least a portion of the features of the first set of features that match features of the second set based on the determined pose (
Schonberger teaches at Paragraph 0049-0055 identifying correspondence between detected image features and map features of the digital environment map…estimating the pose of a camera device may involve determining correspondences between image features detected at 2D pixel positions of images of a real-world environment and map features having 3D map positions in a digital environment map and at Paragraph 0012 that matching 2D point features in the captured images to 3D map features stored in the 3D map as 3D points and the 6DOF pose of the camera may be computed using the 3D point to 3D point matches and their underlying coordinates). 
Schonberger at least suggests the claim limitation: 
determining the accuracy of the determined pose based on alignment of the transformed features of the first set and matching features in the second set (Schonberger teaches at Paragraph 0070-0073 that the preliminary estimated pose may have an associated confidence value proportional to a quantity of image features transmitted to the remote device).
Xu teaches at Paragraph 0150 that the coordinate transformation matrix is obtained from the localization parameters. 
Xu’s localization parameters defining the coordinate transformation matrix when applied to Wright allows for Wright’s localization parameters to be distributed to the client device from the server as a localization service to have defined the coordinate transformation matrix. Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
In view of Xu, Wright/Dine/Beith teaches the claim limitation: determining the accuracy of the determined pose based on alignment of the transformed features of the first set and matching features in the second set (
Wright teaches at Paragraph 0129-130 that each annotation may be saved with reference to a spatial relationship to the object and a temporal relationship to a time within the session…the expert may initiate a second AR session…superimposing a new session and/or objects on the saved annotations wherein the each annotation is superimposed in spatial relationship to the second object based on the saved spatial relationship and in temporal relationship to a time within the second AR session based on the saved temporal relationship and at Paragraph 0223 that the relationships between the annotations and feature points are determined and at Paragraph 0222 that the system would then basically project the annotations correctly onto the screen.. Thus, you may only need the positional relationship (mapping-transformation) between the annotations and the feature point positions and at Paragraph 0179 find the geometric relationship (pose) of the device relative to the environment. 
Wright teaches at Paragraph 0116 that localization information for the annotations may be stored in a very specific form as help sessions for distribution to those having similar problems in similar environments. The localization information includes the geometrical/spatial transformation between the features point positions of the annotations in the camera reference coordinate system and the matched feature point positions of the 3D environment map. 
Wright teaches at Paragraph 0167 that the device may project the annotation onto the 3D reconstruction of the environment and at Paragraph 0213 that we can project this annotation that the helper is drawing on the 2D screen onto the 3D representation of the environment. 
Wright teaches at Paragraph 0179 that when there is a match of objects in the image to information in the environment map, find the geometric relationship (pose) of the device relative to the environment. The user device then displays information from the environment map for objects that are visible within the view of the current camera image and at Paragraph 0222 that the camera may be localized relative to the environment and then all the annotations are localized relative to the environment (applying the transformation of the geometric data associated with the feature point positions of the annotations of the estimated pose of the device having matching feature point positions in the 3D environment map) …as soon as we have a camera position identified the annotations are also defined in 3D space. Thus, you may only need the positional relationship between the annotations and the feature point positions. 
Dine teaches at column 16, lines 50-67 that the plurality of matched common features in the 3D external map 556 and the local 3D map 55 generate a re-localization (e.g., 3D spatial transformation) between the estimated pose of the camera of the electronic device 500B in the 3D map 550 to the estimated pose of the camera of the electronic device 500A in the 3D map 559. Accordingly, localization includes a spatial transformation.  
Beith teaches at Paragraph 0045 of the provisional that the virtual keyboard 228 is displayed as a projection at a positon on the display 210 relative to the hands 226…by registering the virtual keyboard 228 to one or more landmark points detected on the hands 226 and at Paragraph 0049 that as a result of the detection of the one or more landmark points on the hands 226, the pose of the landmarks in relative physical position with respect to the image sensors is established…the one or more landmark points on the hands 226 can be used as real-world registration points for positioning the virtual keyboard 228 on the display). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have received the camera’s pose in order to transform the 2D feature points of the virtual object to the matched 3D feature points in the environment map so as to register the virtual object with respect to the real object in the 3D environment to have transformed the virtual object in an extended reality environment. One of the ordinary skill in the art would have been motivated to have transformed the virtual object in the extended reality environment. 
Re Claim 18: 
The claim 18 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that iteratively forming subsets of matched features based on the computed quality metrics and determining poses for the iteratively formed subsets; and selecting a determined pose based on a determined accuracy of the determined pose.
However, Schonberger further teaches the claim limitation that iteratively forming subsets of matched features based on the computed quality metrics (Schonberger teaches at Paragraph 0061-0066 the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences….a plurality of subsets of 2D data point pairs may be selected…..a RANSAC solver may output a finite number of solutions or potential pose candidates from a subset of identified correspondences….the remote device may identify a best overall camera device pose from the set of every candidate camera device pose. Schonberger teaches at Paragraph 0073 that the remote device may continue receiving image features from the camera device and continue generating new pose estimates) and determining poses for the iteratively formed subsets; and selecting a determined pose based on a determined accuracy of the determined pose (Schonberger teaches at Paragraph 0061-0066 the remote device may identify a subset of correspondences from the overall set of determined 2D image feature to 3D map feature correspondences….a plurality of subsets of 2D data point pairs may be selected…..a RANSAC solver may output a finite number of solutions or potential pose candidates from a subset of identified correspondences….the remote device may identify a best overall camera device pose from the set of every candidate camera device pose).
Re Claim 20: 
The claim 20 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the first set of features includes no more than one hundred features.
However, Schonberger at least suggests the claim limitation that the first set of features includes no more than one hundred features (Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence).

Claim 8 is rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of
Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu); 
Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); 
 Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date); Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright) and Sun et al. US-PGPUB No. 2014/0254936 (hereinafter Sun). 
Re Claim 8: 
The claim 8 encompasses the same scope of invention as that of the claim 9 except additional claim limitation that the first vector is a unit normal vector. 
Schonberger at least suggests the claim limitation that the first vector is a unit normal vector (
Schonberger teaches at Paragraph 0032 SIFT features may be implemented as a 128-dimensionla vector or SURF features. 
and at Paragraph 0055 that this may be done by searching the 3D map for the feature descriptors associated with each image feature and identifying the 3D map features having the most similar feature descriptors in the 3D point cloud….determining the correspondences may include identifying a set of image features having feature descriptors that match feature descriptors of 3D map features in the 3D map. As a result, 2D points detected in images of the real-world environment correspond to 3D points associated with 3D map features, giving a set of 2D point to 3D point correspondences….this feature descriptor matching step can be implemented using one of many nearest neighbor matching techniques. The L2-distance between the descriptor vectors may be used to calculate the pairwise similarity). 
Sun teaches the claim limitation that the first vector is a unit normal vector (Sun teaches at Paragraph 0037 that the feature vector is further normalized into the feature vector of a SIFT descriptor). 
It would have been obvious to one of the ordinary skill in the art before the ling date of the instant application to have normalized the SIFT feature vector into a unit vector feature vector. One of the ordinary skill in the art would have been motivated so that the feature vector can be encoded. 
Claim 15 is rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of 
Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); 
Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date); Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright) and LIN et al. US-PGPUB No. 2021/0358150 (hereinafter LIN). 
Re Claim 15: 
The claim 15 encompasses the same scope of invention as that of the claim 10 except additional claim limitation that the localization component comprises an artificial neural network configured to compute the quality metrics for the matched features. 
Schonberger does not explicitly teach that the localization component comprises an artificial neural network configured to compute the quality metrics for the matched features. 
LIN teaches the claim limitation that the localization component comprises an artificial neural network configured to compute the quality metrics for the matched features (LIN teaches at Paragraph 0105-0106 that the set of features of the scene is matched by determining a distance metric to compare the set of features of the scene to features of a target object or an object represented in a map data structure and then comparing the distance metric to a threshold. A neural network for matching the set of features may use a ranking loss function). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have incorporated LIN’s neural network to have determined the distance metric of the matched featured to have modified Schonberger’s localization module of using a distance metric for the matched features. One of the ordinary skill in the art would have been motivated to have provided a neural network to have compute the distance metric for the matched features. 
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); 
Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date); Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright) and Nerurkar et al. US-PGPUB No. 2017/0336511 (hereinafter Nerurkar). 

Re Claim 19: 
The claim 19 encompasses the same scope of invention as that of the claim 18 except additional claim limitation that the portable electronic device receives the determined pose in no more than ten milliseconds. 
However, Nerurkar teaches the claim limitation that the portable electronic device receives the determined pose in no more than ten milliseconds (Nerurkar teaches at Paragraph 0027-0035 generating estimated poses of the electronic device 100 at a relatively high rate based o the image sensor data. It is understood that relative high rate of a computer system execution is inclusive of a rate of no more than ten milliseconds). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have transmitted the pose of the camera at high rate based on the localization module. One of the ordinary skill in the art would have provided efficient localization based on a continuous or high-frequency receipt of sensor data.  

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); 
Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date); Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright) and Nerurkar et al. US-PGPUB No. 2017/0336511 (hereinafter Nerurkar). 
Re Claim 20: 
The claim 20 encompasses the same scope of invention as that of the claim 17 except additional claim limitation that the first set of features includes no more than one hundred features.
However, Schonberger at least suggests the claim limitation that the first set of features includes no more than one hundred features (Schonberger teaches at Paragraph 0019 that user 100 has a camera device 106 equipped with one or more cameras 108 and at Paragraph 0028 that when the camera device includes multiple cameras, the multiple cameras may have any suitable spatial relationship with respect to each other and with respect to other hardware components of the camera device…the camera device may include manufacturer calibration data that indicates the relative 3D positions of each of multiple cameras of the camera device and at Paragraph 0030 that method 200 includes detecting a first set of image features in a first image of the plurality of images…each individual image captured by a camera device may have any number of such features, e.g., tens, hundreds, thousands, or more and at Paragraph 0034 that the remote device may be configured to estimate a pose of the camera device based on image features detected in the plurality of images…even if such image features are insufficient to estimate a pose for the camera device having sufficiently high confidence).
However, Nerurkar teaches the claim limitation that the first set of features includes no more than one hundred features (Nerurkar teaches at Paragraph 0033 the localization module 230 transforms geometric data associated with the generated feature descriptors of the estimated pose 214 having matching descriptors to be aligned with geometric data associated with a stored map having a corresponding matching descriptor…when the localization module 230 finds a sufficient number of matching feature descriptors from the generated feature descriptors 215 and a stored map to confirm that the generated feature descriptor 215 and the store map contain descriptions of common visual landmarks, the localization module 230 performs a transformation between the generated feature descriptors 215 and the matching known feature descriptors). 
It would have been obvious to one of the ordinary skill in the art before the filing date of the instant application to have transmitted the pose of the camera at high rate based on the localization module. One of the ordinary skill in the art would have provided efficient localization based on a continuous or high-frequency receipt of sensor data.  
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Schonberger et al. US-PGPUB No. 2020/0372672 (hereinafter Schonberger) in view of 
Liu et al. US-PGPUB No. 2014/0254942 (hereinafter Liu); 
Spiegel et al. US-PGPUB No. 2020/0401617 (hereinafter Spiegel); 
Xu et al. US-PGPUB No. 2020/0134366 (hereinafter Xu); Dine et al. US-PGPUB No. 10,748,302 (hereinafter Dine); Beith et al. US-PGPUB No 2021/0065455 (hereinafter Beith based on the provisional application 62/895,970’s filing date) and Wright JR et al. US-PGPUB No. 2020/0394012 (hereinafter Wright). 
Re Claim 3: 
The claim 3 encompasses the same scope of invention as that of the claim 1 except additional claim limitation that the descriptors for individual features are assigned by a trained neural network. 
However, Schonberger does not teach the claim limitation that the descriptors for individual features are assigned by a trained neural network. 
Spiegel teaches the claim limitation that the descriptors for individual features are assigned by a trained neural network (
Spiegel teaches at Paragraph 0139 that CNN may be used which aggregates features extracted from an entire image into a compact feature vector representation that can be efficiently indexed…by plugging into the CNN architecture a generalized Vector of Locally Aggregated Features layer which will output an aggregated representation that can then be compressed to obtain a compact descriptor of the image). 
It would have been obvious to have incorporated Spiegel’s teaching of plugging into the CNN architecture a generalized vector of Locally Aggregated Features layer to have output a compact descriptor of the image based on the extracted features of the image. One of the ordinary skill in the art would have been motivated to have provided a vector representation of the image. 

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JIN CHENG WANG whose telephone number is (571)272-7665. The examiner can normally be reached Mon-Fri 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761]. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/JIN CHENG WANG/Primary Examiner, Art Unit 2613