Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

Status of Claims
Claims 1-20 are currently pending in this application.

Response to Amendments
The applicant amended claims 1, 10 and 19 to include features similar to “determining one or more canonical poses of objects in the interior space based on a 3D model of the interior space” and “labeling voxels representing the objects in the interior space based on the trajectory, the one or more poses of the electronic device, and the one or more canonical poses of the objects”.
The applicant amended claim 4 to include feature “each voxel is associated with a probability distribution indicating probabilities of each class label based in part on the one or more canonical poses of the objects”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2, 10-11 and 13-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Witt et al. (2019/0188473) in view of Shapira et al. (2015/0228114) and further in view of Sachs et al. (2017/0213112).

Regarding claim 1, Witt teaches a method (e.g., techniques for performing place recognition and localization, Witt: [0004] L.1-2) comprising: 
performing, with an electronic device, a three-dimensional (3D) scan of an interior space (e.g., FIG. 1A depicts an example property 100 including an autonomous robot 105 that is configured to perform place recognition and localization.  The property 100 may be represented in a mapping, such as a 2D floor plan or 3D space mapping that describes the interior of the property 100. To determine its location within the property 100, the autonomous robot 105 may obtain information using one or more sensors that are included in the autonomous robot 105.  For example, the autonomous robot 105 may include a camera that is configured to obtain images of portions of the property 100 while the autonomous robot is located at a particular position of the property 100. Witt: [0028] and Fig. 1A; reproduced below for reference. 

    PNG
    media_image1.png
    631
    755
    media_image1.png
    Greyscale

See 1_1 below); 
accessing voxels in a 3D grid that is generated from the 3D scan, wherein the voxels represent portions of a volume of the interior space (e.g., When performing SLAM, an autonomous robot must maintain an awareness of its physical location in the world, for example, its latitude and longitude or its coordinates within a grid representation of a property, while simultaneously obtaining measurements or other information to generate or update a mapping of the property. Witt: [0007] L.1-4.  In some implementations, the mapping of the property may be a 3D representation of the property that represents space within the property, instead of surfaces.  For example, the 3D representation may be comprised of a number of 3D cells that each represent a finite amount of volume within the property.  The resolution of the three-dimensional representation may be determined as necessary for the particular application.  For example, surfaces with more contours may be represented using a higher resolution, e.g., a small polygon size, than surfaces with fewer contours. Witt: [0053] L.10-20.  The 3D cell that represents finite amount of volume is taken as voxel.  The space within the property is referred to as the interior space of the property and the voxel are referred in a coordinates within a grid representation.  See 1_1 for 3D scan); 
determining a trajectory and one or more poses of the electronic device concurrently with performing the 3D scan of the interior space (e.g., Autonomous robots utilize property mappings to perform path planning and navigation within a property.  Given a location of the autonomous robot within the property and a location of a destination within the property, the autonomous robot can plan a path to navigate through the property to its destination.  In some instances, a robot's location within a property can be provided to the autonomous robot, for example, by another system or a human user.  In other implementations, an autonomous robot performs localization to determine its location within the property. Witt: [0003] L.1-10.  The path planning and navigation are taken as trajectory planning and navigation – the path is interpreted as trajectory.  The autonomous robot may attempt to resolve its location further by using cameras, LIDAR, sonar, radar, stereo cameras, or other sensors to determine, for example, its position relative to a wall that the autonomous robot determines it is proximate to within a mapping of the property.  Failure to accurately resolve the location of the autonomous robot within the property can limit its ability to perform accurate path planning and navigation within the property. Witt: [0009] L.5-13. Path planning and navigation within the property require accurate location information and use of one or more sensors.  Each measurement may indicate a location from which the measurement was taken by the autonomous robot, such as coordinates, latitude and longitude, or other location information that indicates a position of the autonomous robot within the property.  The information may also indicate an orientation corresponding to the measurement, such as an indication of a direction from which the measurement was taken and an angle from which the measurement was taken.  The measurements taken by the one or more autonomous robots include a sufficient number of measurements to generate a 2D or 3D mapping of the property, or in some implementations, a portion of the property if a mapping of only a portion of the property is desired. Witt: [0052].   Measurement of a location include a position and orientation information which provides the pose information.  As a number of measurements are need to generate a 2D or 3D mapping of a path, a number of position and orientation information are measured.  The path planning and navigation is taken within the property (the interior space).  See 1_1 for 3D scan);
determining one or more canonical poses of objects in the interior space based on a 3D model of the interior space (see 1_2 below); and 
labeling voxels representing the objects in the interior space based on the trajectory, the one or more poses of the electronic device, and the one or more canonical poses of the objects (e.g., e.g., Autonomous robots utilize property mappings to perform path planning and navigation within a property.  Given a location of the autonomous robot within the property and a location of a destination within the property, the autonomous robot can plan a path to navigate through the property to its destination.  In some instances, a robot's location within a property can be provided to the autonomous robot, for example, by another system or a human user.  In other implementations, an autonomous robot performs localization to determine its location within the property. Witt: [0003] L.1-10.  Property mappings to perform path planning and navigation are taken as trajectory planning and navigation – the path is interpreted as trajectory.  In some implementations, the system 200 is configured to enable the autonomous robot 205 to perform place recognition and localization within a property, such as the property 100 of FIG. 1A.  The system 200 uses mapping sensor data 215 to generate a mapping of the property that can be used to generate the mapping hierarchy 250. Witt: [0049].  The system 200 can also obtain image data 225 or other sensor data 235 from the autonomous robot 205.  The object recognition engine 220 can process the image data 225 or other sensor data 235 to identify one or more objects represented as being located within a portion of the property depicted by the image data 225 or sensor data 235 where the autonomous robot 205 is located within the property.  The place recognition engine 230 can use the objects recognized by the object recognition engine 220 to determine a semantic zone type corresponding to the portion of the property depicted by the image data 225 or sensor data 235.  Based on the determined semantic zone type and the mapping hierarchy 250, the localization engine 240 can determine a particular semantic zone of the property specified by the mapping hierarchy 250 in which the autonomous robot 205 is located.  The localization engine 240 can send data to the autonomous robot 205 to set a current location of the autonomous robot 205 as a particular location at the property that is associated with the particular semantic zone specified by the mapping hierarchy 250. Witt: [0050].  Therefore, a mapping hierarchy 250 is defined for the path with objects in particular semantic zones.  The object recognition engine 220 may process the mapping and the image data or other sensor data to label objects within the mapping.  For example, the object recognition engine 220 may process the image data or other sensor data to identify one or more objects using the techniques described at stage (C).  Corresponding portions of the mapping may be labeled with the identified objects, such as beds, dressers, stoves, or other objects appearing in the property 100 of FIG. 1A.  To do so, the images or other sensor data may be associated with locations or perspectives from which the images or other sensor data were collected, such that locations of the objects identified based on the images or other sensor data can be determined and labelled in the mapping of the property.  Each object labelled in the mapping may also be associated with a location of the object within the property. Witt: [0058].  Objects associated with locations (positions and orientations) or perspectives are labeled under mapping hierarchy 250.  See 1_1 and 1_3 below). 
While Witt does not explicitly teach, Shapira teaches:
(1_1).  a three-dimensional (3D) scan of an interior space (e.g.,  The camera pose is tracked as the camera/sensor is moved (its location and orientation) and because each frame's pose and how it relates to the others is known, these multiple viewpoints of objects or the environment can be fused (averaged) together into a single reconstruction voxel volume.  The voxel volume can be thought of as a large virtual cube in space (the reconstruction volume), located around the scene in the real world, and depth data (i.e. measurements of where the surfaces are) integrated into this as the sensor is moved around. Shapira: [0047] L.5-14. A room can be scanned with a depth sensor and a 3D partial reconstruction volume can be created of the room.  This can then be used to augment the 3D partial reconstruction volume to show objects and items partially occluded. Shapira: [0100] L.3-7.  Therefore, a sensor (depth) is used to scan the room to determine and show objects and items occluded. Thus, position of a location are obtained by scanning the walls that the autonomous robot determinates that it is proximate to within the mapping of the property.  An object is a single voxel volume and a room (interior of property) is a 3D partial reconstruction volume from scanning (3D)).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Shapira into the teaching of Witt so that the interior of the property 100 (Witt: [0028] L.5-6) is scanned with a depth sensor such as one or more stereo cameras, LIDAR, radar, sonar, or other forms of imaging or depth detection. (Witt: [0051] L.8-9).
While the combined teaching of Witt and Shapira does not explicitly teach, Sachs teaches:
(1_2). determining one or more canonical poses of objects in the interior space based on a 3D model of the interior space (e.g., As used herein, the term "canonical pose" refers to a digital item reflecting template features of an individual.  In particular, a canonical pose can include a standard coordinate system relative to template features of an individual.  For example, a canonical pose can include a template face, with a standard center, scale, size, and rotation (e.g., template eyes, nose, forehead, hair, chin, cheeks, eyebrows, or center).  A canonical pose can be based on a template digital image (e.g., the center, scale, size, and rotation of a template face can comprise a face portrayed in a template digital image).  Similarly, a canonical pose can consist of an image comprising standard x and y positions (i.e., canonical x and y channels) reflecting template features of an individual. Sachs: [0036]. For example, FIG. 4A illustrates a probe digital image 402 portraying a target individual 416.  The target individual 416 includes a plurality of identified facial feature points 404a-404c.  FIG. 4A illustrates the probe digital image estimating a transform 410 between the canonical pose 406 and the target individual 416. Sachs: [0098] L.1-6 and Fig. 4A; reproduced below for reference. 

    PNG
    media_image2.png
    491
    737
    media_image2.png
    Greyscale

Specifically, the canonical pose 406 is positioned such that the template face 418 is centered relative to the coordinate system 420 (e.g., the template center facial feature point 408c is aligned at the center of the template coordinate system 420). Sachs: [0103] L.9-13.  It is obvious that object in canonical pose is an object at the center view (focused) of the robot (in proper location or perspective; Witt: [0058] L.1 and/or camera/sensor in proper pose; Shapira: [0049] L.1-6).  Therefore, as the robot of Witt is navigating the property, many objects are sensed along the path of navigation and it has to adjust its (camera/sensor) pose to capture images/sensor data of object in target position (canonical pose).  It is obvious that when the robot is at position A in Fig. 1 of Witt, the robot sees a bed and it is the object that is focused.  At position B in Fig. 1 of Witt, the robot sees a plant 126 and a dresser 124 and the robot can focus on either one of them (in canonical pose));
(1_3). labeling voxels representing the objects in the interior space based on the trajectory, the one or more poses of the electronic device, and the one or more canonical poses of the objects (e.g., The object recognition engine 220 may process the mapping and the image data or other sensor data to label objects within the mapping.  For example, the object recognition engine 220 may process the image data or other sensor data to identify one or more objects using the techniques described at stage (C).  Corresponding portions of the mapping may be labeled with the identified objects, such as beds, dressers, stoves, or other objects appearing in the property 100 of FIG. 1A.  To do so, the images or other sensor data may be associated with locations or perspectives from which the images or other sensor data were collected, such that locations of the objects identified based on the images or other sensor data can be determined and labelled in the mapping of the property.  Each object labelled in the mapping may also be associated with a location of the object within the property. Witt: [0058].  Therefore, objects that are labeled are seen by the robot on its navigation path and captured in image or other sensor data with the sensor (electronic device).  It can be seen from Figs. 1A and 1B that in a room, for example, there are more than one objects (furniture, plant, etc) and objects that are labeled are in central view (in canonical pose; Sachs: Fig. 4A - 418) of the robot).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Sachs into the combined teaching of Witt and Shapira so that target objects are transformed into aligned canonical pose for focus or attention. 

Regarding claim 2, the combined teaching of Witt, Shapira and Sachs teaches the method of claim 1, wherein performing the 3D scan of the interior space comprises acquiring imaging information, depth measurements relative to the electronic device, and location or movement information for the electronic device (e.g., the autonomous robot 205 can obtain one or more images 225 of a portion of the property using a camera of the autonomous robot 205 while the autonomous robot 205 is located at a particular position within the property.  In some implementations, the autonomous robot 205 may obtain additional or different sensor data 235, and may transmit that additional or different sensor data 235 to the object recognition engine 220.  The other sensor data 235 can include, for example, LIDAR, radar, sonar, stereo camera images, or other imaging or depth sensing measurements.  In some implementations, the mapping sensor data 215 may include all or a portion of the image data 225 or the other sensor data 235 obtained by the autonomous robot 205 while located at the particular position of the property.  In those implementations, the object mapping engine 220 may receive the image data 225 or other sensor data 235 from the map generation engine 210 in place of, or in addition to, receiving image data 225 or additional sensor data 235 from the autonomous robot 205.  Witt: [0065] L.5-23). 
 
Regarding claims 10-11, the claims are device claims of method claims 1-2 and 4-9 respectively.  The claims are similar in scope to claims 1-2 respectively and they are rejected under similar rationale as claims 1-2 respectively.
Witt further teaches that “While the foregoing embodiments have been predominantly described with reference to the development or processing of speech inputs for use with applications installed on user devices, the described features may also be used with respect to machines, other devices, robots, or other systems.  For example, the described systems and methods may be used to improve user interactions with machinery, where the machinery has an associated computing system, may be used to develop and implement voice actions for interacting with a robot or system having robotic components, may be used to develop and implement voice actions for interacting with appliances, entertainment systems, or other devices, or may be used to develop and implement voice actions for interacting with a vehicle or other transportation system.” (Witt: [0095]).

Regarding claim 13, the combined teaching of Witt, Shapira and Sachs teaches the electronic device of claim 10, wherein the processor is configured to identify types of objects (e.g., objects (112a, 114a, 116a) of kitchen 110a; objects (122a, 124a, 126a, 142a, 144a, 146a) of bedrooms 120a and 140a; and objects (132a, 134a, 136a) of bathroom 130a; Witt: Fig. 1B) in the interior space by applying a convolutional neural network to the voxels, wherein the convolutional neural network is trained using an annotated dataset (e.g., In some implementations, identification of the objects by the object recognition engine 220, or of the semantic zones of the semantic mapping, may be generated by or determined based on outputs of one or more artificial neural networks, such as one or more deep convolutional neural networks.  For example, a neural network may receive the mapping of the property, the image data 225 or the other sensor data 235 and may generate or output data usable to determine an object mapping of the property.  The object mapping may be provided to another neural network, which may generate or output data usable to determine the semantic mapping of the property. Witt: [0061].  As object recognition is implemented with one or more artificial neural networks, it is well-known that the neural network shall be trained to learn the objects it is to identify (recognize) and this includes the set of objects (such as stove, sink, fridge, bed, dresser, desk/chair, plant, toilet, sink, oval, tub and etc.) in the property). 
 
Regarding claim 14, the combined teaching of Witt, Shapira and Sachs teaches the electronic device of claim 13, wherein the processor is configure to: 
transfer labels of the voxels to vertices of a mesh that represents a 3D model of the interior space (e.g., The object recognition engine 220 may process the mapping and the image data or other sensor data to label objects within the mapping.  For example, the object recognition engine 220 may process the image data or other sensor data to identify one or more objects using the techniques described at stage (C).  Corresponding portions of the mapping may be labeled with the identified objects, such as beds, dressers, stoves, or other objects appearing in the property 100 of FIG. 1A. Witt: [0058] L.1-9.  Thus, objects (voxels) are labeled.  Objects like beds, dressers include surfaces.  The surfaces may be represented using, for example, polygonal meshes, point clouds, point splatting, or any other form of 3D representation.  Witt: [0053] L.8-10. Therefore, labels to objects apply to their surfaces in meshes or point clouds.   According to teaching of Shapira: “The first stage is a depth map conversion that takes the raw depth data from the Kinect® camera/sensor and converts it into floating point depth data in meters, followed by an optional conversion to an oriented point cloud which consists of 3D points/vertices in the camera coordinate system, and the surface normals (orientation of the surface) at these points.” Shapira: [0048] L.3-9.  Thus, labels apply to objects (with surfaces in meshes/point clouds) apply to vertices);  and 
based on the labels of the vertices, segment the vertices into connected components that represent the objects in the interior space (e.g., Based on the labelled mapping of the property and the object groups identified for the property, semantic zones within the property may be identified and labelled as such to generate a semantic mapping of the property.  For example, the localization engine 240 or another component of the system 200 may receive the labelled mapping of the property and information identifying the object groups and may determine semantic zones within the property.  For example, the localization engine 240 may determine, for each object group, an area within the property that includes all of the objects within that group.  Boundaries of the semantic zones may be defined based on barriers within the property, may be determined such that any semantic area within the property does not overlap any other semantic area within the property, or may be determined such that all areas within the property are labelled as being a part of a semantic zone.  Witt: [0060]). 

Regarding claim 15, the combined teaching of Witt, Shapira and Sachs teaches the electronic device of claim 14, wherein the processor is configure to determine poses of the objects based on the connected components (e.g., Based on the labelled mapping of the property and the object groups identified for the property, semantic zones within the property may be identified and labelled as such to generate a semantic mapping of the property.  For example, the localization engine 240 or another component of the system 200 may receive the labelled mapping of the property and information identifying the object groups and may determine semantic zones within the property.  For example, the localization engine 240 may determine, for each object group, an area within the property that includes all of the objects within that group.  Boundaries of the semantic zones may be defined based on barriers within the property, may be determined such that any semantic area within the property does not overlap any other semantic area within the property, or may be determined such that all areas within the property are labelled as being a part of a semantic zone.  Witt: [0060]. The localization engine 240 may identify objects within each of the semantic zones of the property, and may assign nodes in a third level of the mapping hierarchy 250 corresponding to each of the identified objects, which each relate to a node in the second level of the mapping hierarchy 250 that is determined based upon the semantic zone in which the object is located.  Other characteristics of the objects in the property may be identified in fourth and lower levels of the mapping hierarchy 250.  Characteristics of the objects may include, for example, colors, sizes, shapes, orientations, positions relative to other objects, or other characteristics of the objects. Witt: [0062] L.9-21). 
 
Regarding claim 16, the combined the teaching of Witt, Shapira and Sachs teaches the electronic device of claim 15, wherein the processor is configure to perform spatial reasoning at an object level of granularity by applying queries to the 3D model represented by the connected components (e.g., The object recognition engine 220 may process the mapping and the image data or other sensor data to label objects within the mapping.  For example, the object recognition engine 220 may process the image data or other sensor data to identify one or more objects using the techniques described at stage (C). Corresponding portions of the mapping may be labeled with the identified objects, such as beds, dressers, stoves, or other objects appearing in the property 100 of FIG. 1A.  To do so, the images or other sensor data may be associated with locations or perspectives from which the images or other sensor data were collected, such that locations of the objects identified based on the images or other sensor data can be determined and labelled in the mapping of the property.  Each object labelled in the mapping may also be associated with a location of the object within the property. Witt: [0058].  Boundaries of the semantic zones may be defined based on barriers within the property, may be determined such that any semantic area within the property does not overlap any other semantic area within the property, or may be determined such that all areas within the property are labelled as being a part of a semantic zone. Witt: [0060] L.11-16). 
 
Regarding claim 17, the combined teaching of Witt, Shapira and Sachs teaches the electronic device of claim 15, wherein the processor is configure to position, overlay, or blend virtual objects into an augmented reality representation of the 3D scene based on the poses and locations of the connected components (e.g., Fusion system reconstructs a single dense surface model with smooth surfaces by integrating the depth data from a depth camera or sensor, such as, for example, Microsoft Corporation's Kinect®, over time from multiple viewpoints.  The camera pose is tracked as the camera/sensor is moved (its location and orientation) and because each frame's pose and how it relates to the others is known, these multiple viewpoints of objects or the environment can be fused (averaged) together into a single reconstruction voxel volume.  The voxel volume can be thought of as a large virtual cube in space (the reconstruction volume), located around the scene in the real world, and depth data (i.e. measurements of where the surfaces are) integrated into this as the sensor is moved around. Shapira: [0047]). 
 
Regarding claim 18, the combined teaching of Witt, Shapira and Sachs teaches the electronic device of claim 15, wherein the processor is configure to modify a position or an orientation of one of the objects by applying a transformation to a corresponding one of the connected components (e.g., In some implementations, the mapping of the property may be a static mapping of the property, i.e., a mapping that is initially determined by the map generation engine 210 and is not further updated.  In other implementations, the mapping of the property may be periodically updated, or may be updated based on the one or more autonomous robots determining that the property has sufficiently changed.  For example, if furniture within the property is moved, the one or more autonomous robots may determine that the property has sufficiently changed to warrant re-mapping either all or a relevant portion of the property.  The one or more autonomous robots may therefore obtain new mapping sensor data 215 and provide that data to the map generation engine 210.  The map generation engine 210 may update the mapping of the property based on the new mapping sensor data 215, or may generate a new mapping of the property based on the new mapping sensor data 215 or a combination of the new mapping sensor data 215 and the previously received mapping sensor data 215.  Witt: [0054]). 

Regarding claim 19, the claim is a method claim of the combination of device claims 10 and 14.  The claim is similar in scope to the combination of claims 10 and 14 and the claim is rejected under similar rationale as the combination of claims 10 and 14.

Regarding claim 20, the claim is a method claim of the combination of device claims 15 and 17.  The claim is similar in scope to the combination of 15 and 17 and the claim is rejected under similar rationale as the combination of claims 15 and 17.

Claims 3 and 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Witt in view of Shapira and Sachs as applied to claim 2 (11) and further in view of Coddington et al. (2017/0123066).

Regarding claim 3, the combined teaching of Witt, Shapira and Sachs teaches the method of claim 2, wherein acquiring the location or movement information comprises acquiring accelerations that are used to track motion of the electronic device using an inertial measurement unit (IMU) implemented in the electronic device (e.g., In the example shown in FIG. 1A, the autonomous robot 105 determines whether it is located in the first bedroom semantic zone 120 or the second bedroom semantic zone 140 by obtaining additional information from another portion of the property where it is located.  Specifically, as shown in FIG. 1A, the autonomous robot 105 may obtain a second image of a portion of the property 100 from position (B) within the property 100.  In some instances, the position (B) and the position (A) may be very close together, or may even be the same position, to avoid a possibility that the autonomous robot 105 moves to a different semantic zone of the property 100.  In other examples, such as when the first information obtained from the location of the autonomous robot 105 was obscured by an object, the autonomous robot 105 may travel further from position (A) to position (B) to obtain the additional information.  Witt: [0042] L.1-16. See 3_1 below). 
While the combined teaching of Witt and Shapira does not explicitly teach, Coddington teaches:
(3_1). acquiring accelerations that are used to track motion of the electronic device using an inertial measurement unit (IMU) implemented in the electronic device (e.g., An Inertial Measurement Unit (IMU) refers to a device comprising sensors that measure movement by sensing acceleration and rotation.  Non-limiting examples of IMUS include: a 3 DOF (Degrees of Freedom) sensor that senses linear acceleration or angular acceleration or a gravitational vector; a 6 DOF sensor sensing 3 degrees of linear acceleration and 3 degrees of angular acceleration; and A 9 DOF IMU that additionally includes a magnetometer sensor that measures the gravity vector and references magnetic North.  Coddington: [0034]. Therefore, the acceleration of the autonomous robot 105 can be obtained with an inertial measurement unit).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Coddington into the combined teaching of Witt, Shapira and Sachs so that the acceleration of the movement of the autonomous robot 105 can effectively be measured with an inertial measurement unit (IMU).

Regarding claim 12, the claim is a device claim of method claim 3.  The claim is similar in scope to claim 3 and it is rejected under similar rationale as claim 3.

Allowable Subject Matter
Claims 4-9 are objected to being dependent upon rejected base claim.  The claims would be allowable if rewritten in independent form including all the limitations of the base claim and any intervening claims.

The following is a statement of reasons for the indication of allowable subject matter in claim 4:  The prior art of record, either individually or in combination, fails to teach the claimed limitation in the following:
labeling the voxels comprises identifying types of objects in the interior space by applying a convolutional neural network to the voxels, wherein the convolutional neural network is trained using an annotated dataset,  
each voxel is associated with a probability distribution indicating probabilities of each class label based in part on the one or more canonical poses of the objects.
as recited in claim 4.

Claims 5-9 are directly or indirectly dependent from claim 4 and they are objected under similar rationale as claim 4.

Response to Arguments
Applicant’s arguments filed on February 24, 2021 have been fully considered but they are not persuasive.
R1.	The applicant argued on p.8 para. 3 lines 3-5 that “Sach does not remedy the deficiencies of Witt and Shapira. Sach generally relates to using deep learning techniques to automatically select individuals in digital images. See, e.g., Sach, para. [0002].”
The examiner disagreed respectfully.  Sachs defined “canonical pose” in para. [0036]. Sachs further illustrated a canonical pose in Fig. 4A that a target image is shown in canonical pose with coordinate axes.  The target image in canonical pose provides a good reference for scaling, rotation and etc. to position the image into a portrait position.  The teaching of Sachs is combined with the teaching of Witt and Shapira so that when an object (the bed, the plant or the dresser in a bedroom) is targeted, it is placed in canonical pose for easy transform.
R2.	The applicant argued on p.8 para. 1 lines 2-8 that “Sach in view of Witt and Shapira does not provide any suggestion of labeling voxels representing the objects in the interior space based on the trajectory, the one or more poses of the electronic device, and the one or more canonical poses of the objects," as claimed (emphasis added). Sach in view of Witt and Shapira, at most explains labeling object within the mapping based on locations or perspectives, but not based on (1) "the trajectory", (2) "the one or more poses of the electronic device," and (3) "the one or more canonical poses of the objects," as claimed ( emphasis added).”
The examiner disagreed respectfully.  Labeling of object within the mapping is based on target object to be labeled, its perspective and location.  The reference of Witt teaches navigation of the robot within the property, the path planning and navigation is taken as a trajectory.  It is obvious that if dressed in the bedroom 120 is to be labeled, the robot has to navigate to bedroom 120 to a position B. At position B, two objects: plant 126 and dresser 124 are in view of the robot.  The robot has to position itself to target on the dresser to label it and hence the robot (camera/sensor) has to pose (Witt: [0058], Shapira: [0049]) towards the dresser and focused on the dresser (in canonical pose).
For details, please see rejection to the claim above.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to SING-WAI WU whose telephone number is (571)270-5850.  The examiner can normally be reached on 9:00am - 5:30pm (Central Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/SING-WAI WU/Primary Examiner, Art Unit 2611