DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 08/17/2022 have been fully considered and are persuasive. So the restrictions filed on 08/04/2022 is withdrawn.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1, 5-9, 11, 14-16, 20-25, 36-38 is/are rejected under 35 U.S.C. 103 as being unpatentable over Speasl et al. (US 2019/0236732 A1) in view of Mccormac et al. (US 2019/0147220 A1) and further in view of Davidson et al. (US 2020/0094405 A1).
Regarding claim 1, Speasl teaches:
A system configured to generate a virtual representation of a location ([0002], “The present invention generally relates to gathering information about property. More specifically, the present invention relates to collecting and transforming data regarding physical space into a virtual layout for property-level intelligence, inspection, and reports.”) with spatially localized information of elements within the location being embedded in the virtual representation, (FIG. 1, generated representation includes different elements of a house.) the system comprising one or more hardware processors configured by machine-readable instructions ([0130], “Memory 1320 stores, in part, instructions and data for execution by processor 1310. Memory 1320 can store the executable code when in operation.”) to: 
receive description data of a location, the description data being generated via at least one of a camera, a user interface, an environment sensor, and an external location information database, the description data comprising a plurality of images, and pose matrices; (“[0028] The unmanned vehicles 105 and 180 illustrated in FIG. 1A collect digital media data through various sensors of the unmanned vehicles 105 and 180 about different locations along respective paths 115 and 185 about a property 110 that includes at least one structure 120.”)
receive metadata associated elements within the location; ([0035] gives an example of the metadata associated with roof element in the house location: “For example, a first reference 160 is a reference image 160 identifying damage to the roof 140. The UAV 105 or UGV 180, or a server or other computer system 1300 that the UAV 105 or UGV 180 sends its media data to upon capture, may automatically identify irregularities in the property such as damage, and automatically mark those areas with reference images such as the reference image 160. Capture data associated with the reference image 160 shows it was captured at latitude/longitude coordinates (37.79, −122.39), that the capture device was facing north-east at the time of capture (more precise heading angle data may be used instead), that the capture device was at an altitude of 20 meters when this image 160 was captured, and that the inclination of the capture device was −16 degrees at capture.” [0046], “Data collected may also be from navigation satellites incorporating L3, L4 signals, virtual sensors; drones, aircraft, satellites, mobile digital devices, telematics, holographic, connected home data supported the cloud repository and by the enhanced 3rd party data will form an automated system to generate a completed, secure, property level intelligence appraisal system describing property values, certified property geo location, visualization media, market trends, property conformity information, property risks, usage history for heating systems, usage history for cooling systems, usage history for predictive sales price predictions, and appraised value on a specific date.”)
generate, in real-time, … a 3-dimensional (3D) representation of the location and elements therein ([0029]:“Digital media data gathered by the sensors of the UAV 105, the sensors of the UGV 180, and optionally other sensors may be combined, for example using a space mapping algorithm, to generate a two-dimensional or three-dimensional layout or model 190 of the property 110 and the structure 120 within it as illustrated in and discussed further with respect to FIG. 2B.”) and 
generate, based on the 3D model of the location, a virtual representation of the location by annotating the 3D model with spatially localized metadata associated with the elements within the location, (FIG. 1B [0035] teaches annote the house elements with the metadata associated with roof element in the house location: “For example, a first reference 160 is a reference image 160 identifying damage to the roof 140. The UAV 105 or UGV 180, or a server or other computer system 1300 that the UAV 105 or UGV 180 sends its media data to upon capture, may automatically identify irregularities in the property such as damage, and automatically mark those areas with reference images such as the reference image 160. Capture data associated with the reference image 160 shows it was captured at latitude/longitude coordinates (37.79, −122.39), that the capture device was facing north-east at the time of capture (more precise heading angle data may be used instead), that the capture device was at an altitude of 20 meters when this image 160 was captured, and that the inclination of the capture device was −16 degrees at capture.” )
the virtual representation being editable by a user to allow modifications to the spatially localized metadata. ([0034], “The generated layout or model 190 may include various “references” or “links” or “hyperlinks” or “pointers” at specific locations within the layout 190 that allow a user viewing the layout 190 to view the original media data captured at the corresponding location within the actual property. Thus, a user can click, touch, or otherwise interact with a specific location the layout 190 to bring up a photograph or a video captured by the UAV 105, UGV 180, or another sensor from which media data was captured and used to generate the layout 190 or to supplement the layout 190 with localized data, such as data regarding water quality or soil sample analysis at a particular location within the property 110.”)
	However, Speasl does not, but Mccormac teaches:
	annotating the 3D model with … and semantic information of the elements within the location,([0139], FIG. 10A 1020 teaches using semantic information to annotate elements in a house, “As this occurs, the robotic device may be arranged to generate a semantically labelled surfel model as described herein and store this in the data storage device…. a domestic robot may be configured to apply one set of functions to portions of the space with a ‘carpet floor’ label and another set of functions to portions of the space with a ‘linoleum floor’ label.” [0078] teaches using machine learning to generate semantic information for objects in an image.)
	Speasl teaches generating a virtual representation of a location by annotate elements within the location with metadata information; Mccormac further teaches generating a virtual representation of a location by annotate elements using semantic information, and the semantic information  is generated using machine learning for objects in an image
	It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the teachings of Speasl with the specific teachings of Mccormac to annotate elements with both metadata and semantic information to help users to easily identify elements in a location. Furthermore, using machine learning to generate semantic information provides a more accurate semantic information method.
	However, Speasl in view of Mccormac does not, but Davidson teaches:
generate, in real-time, via a machine learning model and/or a geometric model, a 3-dimensional (3D) representation of the location and elements therein, the machine learning model being configured to receive the plurality of image and pose matrices as inputs and predict geometry of the location and the elements therein to form the 3D model; ([0044]“In FIG. 3, the static image 291A and optionally the static vision sensor pose 292A are applied as input to a trained CNN encoder 122 to generate a global geometry representation 223. The global geometry representation 223 is an encoding that is a high-dimensional geometry representation, and is generated based on processing of the static image 291A, and optionally the static vision sensor pose 292A, using the trained CNN encoder 122. In other words, the global geometry representation 223 is an encoding of the static image 291A and optionally the static vision sensor pose 292A, as generated based on the trained CNN encoder 122. As described herein (e.g., description related to FIGS. 4 and 6), the CNN encoder 122 can be trained so that the global geometry representation 223 generated using the CNN encoder 122 represents 3D features (e.g., 3D shape) of object(s) captured by the static image 291A. In some of those implementations, the global geometry representation 223 is an encoding and is viewpoint invariant (e.g., identity units). ”)
Speasl in view of Mccormac teaches generating 3D representation of a location.  Davidson teaches a 3D generation method of using machine learning model to generate geometry representation with images and pose data as input.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the 3D representation generation method of Speasl in view of Mccormac by the specific method of Davidson. Davidson provides more detailed information about shape, location and orientation for an object. It would better help system to accurately recognize elements in a location. 

Regarding claim 5, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein generating the virtual representation with spatially localized metadata comprises: spatially localizing the metadata using a geometric estimation model, or manual entry of the metadata via a graphical user interface configured to allow a user to hover over or select a particular element and edit the metadata. (Speasl [0034], “Thus, a user can click, touch, or otherwise interact with a specific location the layout 190 to bring up a photograph or a video captured by the UAV 105, UGV 180, or another sensor from which media data was captured and used to generate the layout 190 or to supplement the layout 190 with localized data, such as data regarding water quality or soil sample analysis at a particular location within the property 110.”)

Regarding claim 6, Speasl in view of Mccormac and Davidson teaches:
The system of claim 5, wherein spatially localizing of the metadata comprises: receiving additional images of the location and associating the additional images to the 3D model of the location; computing camera poses associated with the additional images with respect to the existing plurality of images and the 3D model; and relocalizing, via the geometric estimation model and the camera poses, the additional images and associating the metadata. (This is a feature not selected from claim 4.)

Regarding claim 7, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, further comprising instructions to: display metadata about an element when a user hovers over or selects the element within virtual representation of the location. (Speasl [0034], “Thus, a user can click, touch, or otherwise interact with a specific location the layout 190 to bring up a photograph or a video captured by the UAV 105, UGV 180, or another sensor from which media data was captured and used to generate the layout 190 or to supplement the layout 190 with localized data, such as data regarding water quality or soil sample analysis at a particular location within the property 110.”)

Regarding claim 8, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein the metadata associated with the element comprises at least one of: geometric properties of the element; material specifications of the element; a condition of the element; receipts related to the element; invoices related to the element; spatial measurements captured through the virtual representation or physically at the location; details about insurance coverage; audio, visual, or natural language notes; or 3D shapes and objects including geometric primitives and CAD models. (Speasl [0046], “Data collected may also be from navigation satellites incorporating L3, L4 signals, virtual sensors; drones, aircraft, satellites, mobile digital devices, telematics, holographic, connected home data supported the cloud repository and by the enhanced 3rd party data will form an automated system to generate a completed, secure, property level intelligence appraisal system describing property values, certified property geo location, visualization media, market trends, property conformity information, property risks, usage history for heating systems, usage history for cooling systems, usage history for predictive sales price predictions, and appraised value on a specific date.” FIG. 1B shows the condition of elements.)

Regarding claim 9, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein generating the virtual representation with the semantic information comprises: identifying elements from the plurality of image or the 3D model by a semantically trained machine learning model, the semantically trained machine learning model configured to perform semantic or instance segmentation and 3D object detection and localization of each object in an input image. (Mccormac, [0075] “The image classifier 455 may implement at least one of a variety of machine learning methods. It may use, amongst others, support vector machines (SVMs), Bayesian networks, Random Forests, nearest neighbour clustering and/or neural networks. In certain examples, the image classifier 455 may comprise a convolutional neural network (CNN). The CNN may have multiple convolution layers (e.g. 16 in one example), sometimes informally referred to as a “deep learning” approach. In one case, the CNN is configured to output the object-label probability values as a set of pixel maps (e.g. images) for each frame of video data. This may be achieved by communicatively coupling a deconvolutional neural network to the output of the CNN. Further details of an example CNN featuring deconvolution layers may be found in the paper by H. Noh, S. Hong, B. Han on Learning deconvolution network for semantic segmentation (see arXiv preprint arXiv:1505.04366-2015). The image classifier 455 may thus be configured to output a dense pixel-wise semantic probability map following suitable training. Example test operating parameters for a CNN image classifier 455 comprise a learning rate of 0.01, momentum of 0.9 and weight decay of 5×10.sup.−4. In this case after 10,000 iterations the learning rate was reduced to 1×10.sup.−3, wherein training took 20,000 iterations. In this test example, original CNN weights were first pre-trained on a dataset of images associated with a general image classification task. The weights were then fine-tuned for a scene-segmentation task associated with the present 3D semantic-labelling. One or more graphics processing units may be used to train and/or implement the image classifier 455.” The combination of claim 1 is incorporated here.)

Regarding claim 11, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, further comprising instructions to: generate feedback information configured to guide the user to collect additional description data at a particular portion of the location; and update the virtual representation based on the additional description data collected in response to the feedback information to cause improvements in accuracy and confidence level of the virtual representation. (Speasl [0034], “Thus, a user can click, touch, or otherwise interact with a specific location the layout 190 to bring up a photograph or a video captured by the UAV 105, UGV 180, or another sensor from which media data was captured and used to generate the layout 190 or to supplement the layout 190 with localized data, such as data regarding water quality or soil sample analysis at a particular location within the property 110.” For example, users can click the play button on FIG. 1B to acquire more information to be presented on the display. FIG. 2B [0049] gives another example. As more information is provided, the accuracy and confidence level of information regarding elements are improved.)

Regarding claim 14, Speasl in view of Mccormac and Davidson teaches:
The system of claim 11, wherein updating the virtual representation based on the additional description data comprises: inputting the additional data to the machine learning model to update the corresponding portion of the virtual representation.(Mccormac teaches using the classifier (a machine learning model) to update semantic probability value“ [0022] According to a second aspect of the present invention there is provided an apparatus for detecting objects in video data comprising: an image-classifier interface to receive two-dimensional object-label probability distributions for individual frames of video data; a correspondence interface to receive data indicating, for a given frame of video data, a correspondence between spatial elements within the given frame and surface elements in a three-dimensional surface element representation, said correspondence being determined based on a projection of the surface element representation using an estimated pose for the given frame; and a semantic augmenter to iteratively update object-label probability values assigned to individual surface elements in the three-dimensional surface element representation, wherein the semantic augmenter is configured to use, for a given frame of video data, the data received by the correspondence interface to apply the two-dimensional object-label probability distributions received by the image classifier interface to object-label probability values assigned to corresponding surface elements.” The combination of claim 1 is incorporated here.)

Regarding claim 15, Speasl in view of Mccormac and Davidson teaches:
The system of claim 11, wherein the feedback information is generated in real-time, and the additional data is collected in real-time to update the virtual representation in real-time. (Speasl [0049], “In some cases, a user might walk through the structure 220 wearing an augmented reality headset or otherwise viewing an augmented-reality viewing device after having generated the layout 290. Alternately, a user wearing a virtual reality headset or otherwise viewing a virtual reality or telepresence viewing device may virtually traverse the layout 290. As the user traverses the structure 220 or layout 290, the reference images identified in FIG. 2B may appear, superimposed, over the structure 220 (in augmented reality) or layout 290 (in virtual reality) where appropriate. In some cases, the user can also bring up other media, such as other images, captured of areas that were not automatically flagged as important reference data like those flagged in FIG. 2B, in the same way automatically or upon request (e.g., by pressing a button or otherwise inputting a particular command).”)

Regarding claim 16, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, further comprising instructions to: specify points of interest within the virtual representation displayed on a graphical user interface; generate, based on the points of interest, a floor plan; and spatially localize the floor plan on to the virtual representation.( Speasl FIG 3B shows user can view the ventilation details as “walk around” FIG. 3A, the interest of point will bring generated layout 390:“ [0051] The interior 335 of the structure 320 of FIG. 3A is a complex ventilation system that a human being could not fit into inside. Thus, small UAV 105 that is autonomously guided to carefully traverse the area without bumping into anything is a perfect way to navigate such an environment without causing any damage to the structure 320, as might occur using any other method of traversal. The UAV 105 enters the ventilation system (the interior 335 of the structure 320) via an entry point 305, travels along a path 315 indicated by a dashed line, and exits the ventilation system (the interior 335 of the structure 320) via an exit point 310. The UAV 105 captures media data through its sensors at multiple locations along the path 315. [0052] FIG. 3B illustrates a generated layout of the ventilation system of the property of FIG. 3A that identifies a feature of the ventilation system based on media captured by the unmanned aerial vehicle (UAV) of FIG. 3A. [0053] The generated layout 390 of FIG. 3B is generated based on the media data captured by the sensors of the UAV 105 while it travels along the path 315 in FIG. 3A. In the case of FIG. 3B, the sensors of the UAV 105 include at least one camera, as a reference image 340 is identified showing a location at which a tear in the ventilation was automatically detected within the media. The direction of the capture device (UAV 105) is identified as east at the time of capture, and the air quality or dust level as identified using an air quality sensor of the capture device (UAV 105) is identified as low, likely due to the tear in the ventilation. [0054] In the example of FIG. 3B, the reference image 340 is displayed via a controller and viewing device 350 along with an interface 345. The viewing device 350 is a computing device 1300 such as a smartphone, tablet, laptop, or other mobile device. The interface 345 includes an arrow forward, an arrow backward, and arrows turning left and right, respectively. The arrow forward in this interface 345 can “progress” or “move” the view output by the viewing device 350 “forward”—that is, further through the ventilation in the direction that the image is facing (east). In contrast, the arrow backward in this interface 345 can “progress” or “move” the view output by the viewing device 350 “backward”—that is, further through the ventilation west, the direction opposite the direction the image is facing (east). The arrow left can rotate the view left (north) and the arrow right can rotate the view right (south) relative to the direction that the image is facing (east). While the viewing device 350 is illustrated as a smartphone in FIG. 3B, it may be a virtual reality or augmented reality head-mounted display 900 such as the one in FIG. 9, or any other display system 1370 discussed with respect to FIG. 13.”)

Regarding claim 20, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein the description data comprises one or more media types, the media types comprising at least one or more of video data, image data, audio data, text data, user interface/display data, and/or sensor data.( Speasl FIG. 1B shows the different description data, which includes video data 165 ad image 160.)

Regarding claim 21, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein receiving description data comprises receiving sensor data from the one or more environment sensors, the one or more environment sensors comprising at least one of a GPS, an accelerometer, a gyroscope, a barometer, or a microphone.( Speasl “ [0086] Position sensor 735 can include an inertial measurement unit (IMU) or inertial navigation system (INS) for determining the acceleration and/or the angular rate of UAV 105 using one or more accelerometers and/or gyroscopes, a GPS receiver for determining the geolocation and altitude of UAV 105, a magnetometer for determining the surrounding magnetic fields of UAV 105 (for informing the heading and orientation of UAV 105), a barometer for determining the altitude of UAV 105, etc. Position sensor 735 can include a land-speed sensor, an air-speed sensor, a celestial navigation sensor, etc.”)

Regarding claim 22, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein description data is captured by a mobile computing device associated with a user and transmitted to the one or more processors with or without a first user and/or other user interaction. (Speasl FIG. 3B teaches that users can acquire description data from sensors by controlling the viewing device.)

Regarding claim 23, Speasl in view of Mccormac and Davidson teaches:
The system of claim 1, wherein the description data of the location comprises receiving a real-time video stream of the location. (Speasl FIG. 3B shows the data that is received in real-time from the drone as illustrated in FIG. 3A.)

Regarding claim 24, Speasl in view of Mccormac and Davidson teaches:
The system of claim 23, wherein generating the virtual representation comprises: generating or updating the 3D model based on the real-time video stream of the location. (Speasl FIG. 3B shows a 3D model that is generated based on the data that is received in real-time from the drone as illustrated in FIG. 3A.)

	Regarding claim 25, Speasl in view of Mccormac teaches:
The system of claim 1, wherein the generating, in real-time, the virtual representation comprises: receiving, at a user device, the description data of the location, transmitting the description data to a server … to generate the 3D model of the location, generating, at the server based on… and the description data, the virtual representation of the location, and transmitting the virtual representation to the user device. (Speasl FIG. 3A and 3B teaches a client device receives description data from sensors. ([0029] teaches generating 3D model and FIG 1B shows the virtual representation: “Digital media data gathered by the sensors of the UAV 105, the sensors of the UGV 180, and optionally other sensors may be combined, for example using a space mapping algorithm, to generate a two-dimensional or three-dimensional layout or model 190 of the property 110 and the structure 120 within it as illustrated in and discussed further with respect to FIG. 2B.” FIG. 5 also teaches a client server architecture. [0130] also teaches a computing device can be a server. It is well-known in the art that in a client server architecture, the client can transmit data to a server, let server do the computation, and then receive the results from the server. Here, Examiner takes official notice. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the 3D representation generation method of Speasl in view of Mccormac and Davidson with this well-known knowledge to let the server perform the function of generating the 3D model of the location and the virtual representation of the location. The benefit would to have a light client device. On the other hand, it would be easy to share results to different clients devices which have connections with the server.)
However, Speasl in view of Mccormac does not, but Davidson teaches:
server configured to execute the machine learning model to generate the 3D model of the location, generating, at the server based on the machine learning model and the description data, the virtual representation of the location, ([0044]“In FIG. 3, the static image 291A and optionally the static vision sensor pose 292A are applied as input to a trained CNN encoder 122 to generate a global geometry representation 223. The global geometry representation 223 is an encoding that is a high-dimensional geometry representation, and is generated based on processing of the static image 291A, and optionally the static vision sensor pose 292A, using the trained CNN encoder 122. In other words, the global geometry representation 223 is an encoding of the static image 291A and optionally the static vision sensor pose 292A, as generated based on the trained CNN encoder 122. As described herein (e.g., description related to FIGS. 4 and 6), the CNN encoder 122 can be trained so that the global geometry representation 223 generated using the CNN encoder 122 represents 3D features (e.g., 3D shape) of object(s) captured by the static image 291A. In some of those implementations, the global geometry representation 223 is an encoding and is viewpoint invariant (e.g., identity units). ”)
Speasl in view of Mccormac teaches generating 3D representation of a location.  Davidson teaches a 3D generation method of using machine learning model to generate geometry representation with images and pose data as input.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the 3D representation generation method of Speasl in view of Mccormac by the specific method of Davidson. Davidson provides more detailed information about shape, location and orientation for an object. It would better help system to accurately recognize elements in a location. 

Claim 36, 38 recites similar limitations of claim 1, 25 respectively, thus are rejected using the same rejection rationale respectively.

Regarding claim 37, Speasl in view of Mccormac and Davidson teaches:
The system of claim 36, wherein generating the virtual representation comprises: generating or updating the 3D model based on the real-time video stream of the location. (Speasl FIG. 3B shows the live data input is used to generate the layout of 390.)

Claim(s) 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Speasl in view of Mccormac and Davidson and further in view of Wang et al. (US 2019/0012804 A1).
Regarding claim 40, Speasl in view of Mccormac and Davidson teaches:
The system of claim 36, further comprising 
However, Speasl in view of Mccormac and Davidson does not, but Wang teaches:
estimating intrinsics and extrinsics camera parameters using structure from motion (SFM) algorithm. ([0136], “…where K, R and t are the camera intrinsic (K) and extrinsic (R, t) parameters, respectively, of each virtual camera estimated by SfM.’)
Speasl in view of Mccormac and Davidson teaches getting pose information, but does not teach the details of how to get it. Wang teaches a specific method of acquiring pose information by considering camera parameters.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the 3D representation generation method of Speasl in view of Mccormac and Davidson with specific teachings of Wang to generate more accurate pose information.

Allowable Subject Matter
Claims 2-4, 10, 12-13, 17-19, 39, 41-44 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
The following is a statement of reasons for the indication of allowable subject matter:  
none of the references along or in combination teaches the limitations of “wherein generating the 3D model comprises: encoding each image of the plurality of images with the machine learning model; adjusting, based on the encoded images of the plurality of images, an intrinsics matrix associated with the camera; using the intrinsics matrix and pose matrices to back-project the encoded images into a predefined voxel grid volume; providing the voxel grid with the features as input to a neural network to predicts a 3D model of the location for each voxel in the voxel grid; and extracting a 2D surface of the predicted 3D model.” Recited in claim 2 and similarly recited in claim 39 in combination with the limitations recited in parent claims.
none of the references along or in combination teaches the limitations of “wherein generating the virtual representation further comprises: estimating, via a pose estimation method and the plurality of images, pose data based on the plurality of images and intrinsics of the camera, obtaining heuristic information associated with one or more standard elements detected within the location, and estimating, based on the heuristic information and geometric correlation between the plurality of images, a scale factor to determine dimensions of the elements within the location.” Recited in claim 10 in combination with the limitations recited in parent claims.
none of the references along or in combination teaches the limitations of “wherein generating the feedback information comprises: generating visual indicators representing confidence levels associated with the elements within the location, wherein visual indicators having a relatively low confidence level guide the user to collect additional description data at a particular portion of the location.” Recited in claim 12 in combination with the limitations recited in parent claims.
none of the references along or in combination teaches the limitations of “generate a floor plan of the location by a layout estimation algorithm configured to use lines in the 3D model, plane information related to the elements in the 3D model, vanishing points of the location, or geometric indicators related to the 3D model to estimate the floor plan of the location.” Recited in claim 17 in combination with the limitations recited in parent claims.
none of the references along or in combination teaches the limitations of “obtain, via a database or a user, heuristic information associated with one or more elements detected within the location, the heuristic information comprising dimension data associated with an element; estimate, based on the heuristic information and geometric correlation between the plurality of images, a scale factor for adjusting sizes of the elements in the images; estimate, based on the scale factor, dimensions of the elements within the location; and update, based on the estimated scale factor and the estimated dimensions, the virtual representation of the location by adjusting sizes of the elements within the location and annotating the 3D model with estimated dimensions of the elements of the location.” Recited in claim 41 in combination with the limitations recited in parent claims. 

Claim 26-35 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:  
Regarding claim 26, Speasl teaches:
A system configured to generate a virtual representation of a location ([0002], “The present invention generally relates to gathering information about property. More specifically, the present invention relates to collecting and transforming data regarding physical space into a virtual layout for property-level intelligence, inspection, and reports.”)with spatially localized information of elements within the location being embedded in the virtual representation, (FIG. 1, generated representation includes different elements of a house.) the system comprising one or more hardware processors configured by machine- readable instructions ([0130], “Memory 1320 stores, in part, instructions and data for execution by processor 1310. Memory 1320 can store the executable code when in operation.”) to: 
receive description data of a location, the description data being generated via at least one of a camera, a user interface, an environment sensor, and an external location information database, the description data comprising a plurality of images; (“[0028] The unmanned vehicles 105 and 180 illustrated in FIG. 1A collect digital media data through various sensors of the unmanned vehicles 105 and 180 about different locations along respective paths 115 and 185 about a property 110 that includes at least one structure 120.”)
Mccormac teaches:
generate, via a machine learning model and/or a geometric model, a 3-dimensional (3D) representation of the location and elements therein, the machine learning model being configured to receive the plurality of image as input and predict geometry of the location and the elements therein to form the 3D model; ([0044]“In FIG. 3, the static image 291A and optionally the static vision sensor pose 292A are applied as input to a trained CNN encoder 122 to generate a global geometry representation 223. The global geometry representation 223 is an encoding that is a high-dimensional geometry representation, and is generated based on processing of the static image 291A, and optionally the static vision sensor pose 292A, using the trained CNN encoder 122. In other words, the global geometry representation 223 is an encoding of the static image 291A and optionally the static vision sensor pose 292A, as generated based on the trained CNN encoder 122. As described herein (e.g., description related to FIGS. 4 and 6), the CNN encoder 122 can be trained so that the global geometry representation 223 generated using the CNN encoder 122 represents 3D features (e.g., 3D shape) of object(s) captured by the static image 291A. In some of those implementations, the global geometry representation 223 is an encoding and is viewpoint invariant (e.g., identity units). ”)
However, none of the references along or in combination teaches the limitations of “obtain, via a database or a user, heuristic information associated with one or more elements detected within the location, the heuristic information comprising dimension data associated with a element; estimate, based on the heuristic information and geometric correlation between the plurality of images, a scale factor for adjusting sizes of the elements in the images; estimate, based on the scale factor, dimensions of the elements within the location; and update, based on the estimated scale factor and the estimated dimensions, the virtual representation of the location by adjusting sizes of the elements within the location and annotating the 3D model with estimated dimensions of the elements of the location.”
Claims 27-35 are allowed for depending on the claim 26.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2611