DETAILED ACTION

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 2-3, 10-11, and 16-17 are rejected under 35 U.S.C. 103 as being unpatentable over Kwon (20160342863, as provided in IDS), and in view of Choi (US 20170124415 A1, as provided in IDS).
		Re Claim 2, KWON discloses a system a system comprising: 
		one or more processors; and
		a non-transitory computer readable medium comprising instructions that, when executed by the one or more processors, cause the system to perform operations (see Kwon: e.g.,  Fig. 1, and in [0023]) comprising: 
		inputting sensor data associated with an environment comprising an object into a machine learning algorithm (see Kwon: e.g., Fig. 1, --system 100 for recognizing an object in an image--, in [0021], and, --The images of objects received by the recognition server 101 can include an image captured by the client device 115--, in [0023]-[0025]--identify the depicted objects by classifying one or more regions of interest in the query image into product classes using convolutional neural network (CNN).--, in [0029]; and, --The classification module 207 may include software and/or logic to classify a region of an image, e.g., a ROI of the query image.  For example, when a ROI containing a potential object in a query image has been localized by the region detector 205, the ROI (e.g., the image content surrounded by its bounding box) may be fed into the classification module 207 to be assigned to one or more classes.  In some embodiments, the classification module 207 may include one or more convolutional neural networks (CNN) and/or any kind of machine learning classifiers that use learned features, representation learning, deep learning, or any combination thereof to classify the ROI.--, in [0058]) that comprises:
		a coarse output branch configured to output a coarse output branch (see Kwon: e.g., --the CNN classification module 207 can be trained to extract features and recognize products at coarse-grained level (e.g., raw categorization of products)--, in [0059], and,  --the CNN classification module 207 may classify ROIs in the query image into category classes when coarse product categorization of the query image is required.--, in [0061]);
		a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch (see Kwon: e.g., --and fine-grained level (e.g., refined categorization of products, discrimination of similar products from the same brand or category).--, in [0059], and [0068] {herein “refined… discrimination” is the offset); 
		Kwon however does not explicitly disclose the offset with respect to the coarse output by the coarse output branch,
		Choi teaches a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030]).
		Kwon and Choi are combinable as they are in the same field of endeavor: to detect and classify target objects from image processing techniques, particularly both using the convolutional neural network as the detection and classification tool. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Kwon’s system using Choi’s teachings by including a fine offset branch configured to output an offset with respect to the coarse output by the coarse output branch to Kwon’s machine learning in order to identify the object of interest (e.g., car, pedestrian, etc.) and estimate the location and the orientation of such object (see Choi: e.g. in in [0116]-[0017], [0028], [0030], [0050]-[0051], [0061]-[0063], and [0066]);
	Kwon as modified by Choi further disclose receiving, from the machine learning algorithm, output associated with a physical parameter of the object, the output comprising the coarse output, the offset, and a confidence value of a set of confidence values associated with the coarse output (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030], --These cony filters operate on the extrapolated cony feature maps, and output heat maps that indicate the confidences of the presence of objects in the input image.--, in [0038]-[0043]).

		Re Claim 3, Kwon as modified by Choi further disclose the confidence value of the set of confidence values is a highest confidence value and associated with a potential physical parameter associated with the object (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030], --These cony filters operate on the extrapolated cony feature maps, and output heat maps that indicate the confidences of the presence of objects in the input image.--, in [0038]-[0043]).

Re Claims 10-11, claims 10-11 are the corresponding method claim to claims 2-3 respectively.  Claims 10-11 thus are rejected for the similar reasons for claims 2-3. See above discussions with regard to claims 2-3 respectively. Further, Kwon as modified by Choi further disclose method of output a coarse output branch, offsets, and confidences (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030]).

		Re Claims 16-17, claims 16-17 are the corresponding medium claim to claims 2-3 respectively.  Claims 16-17 thus are rejected for the similar reasons for claims 2-3. See above discussions with regard to claims 2-3 respectively. Further, Kwon as modified by Choi further disclose non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations (see Kwon: e.g., Fig. 1, and in [0023]).

3.	Claims 4-9, 12-15 and 18-21  are rejected under 35 U.S.C. 103 as being unpatentable over Kwon as modified by Choi, and further in view of Kutliroff (US 20170228940 A1, as provided in IDS).
		Re Claim 4, Kwon as modified by Choi further disclose determining, based at least in part on the sensor data, a two dimensional bounding box associated with the object, wherein: the sensor data comprises image data, the inputting is based at least in part on the two dimensional bounding box (see Kwon: e.g., --identify the depicted objects by classifying one or more regions of interest in the query image into product classes using convolutional neural network (CNN).--, in [0029]; and, --The classification module 207 may include software and/or logic to classify a region of an image, e.g., a ROI of the query image.  For example, when a ROI containing a potential object in a query image has been localized by the region detector 205, the ROI (e.g., the image content surrounded by its bounding box) may be fed into the classification module 207 to be assigned to one or more classes.  In some embodiments, the classification module 207 may include one or more convolutional neural networks (CNN) and/or any kind of machine learning classifiers that use learned features, representation learning, deep learning, or any combination thereof to classify the ROI.--, in [0058], , similarly also see Kwon: e.g., -- a region of interest (ROI) is a portion of the query image that potentially contains an object of interest, for example, a packaged product presented in the scene.  In some embodiments, a ROI in the query image may be indicated by a bounding box enclosing the image area it covers.--, in [0047]; in [0034], and [0069]-[0070], and [0073]-[0074]; also see Choi: e.g., -- First, a canonical bounding box centered on (x,y) with width and height the same as that of the cony filters (e.g., 5.times.5) in the subcategory cony layer is generated,--, in [0040]);
		the output associated with the physical parameter of the object comprises: an orientation of a three dimensional bounding box associated with the object; and a dimension of the three dimensional bounding box (see Kwon: e.g., product package with features of “dimensions (e.g., width, height, depth, etc.)”,, and “the indexer 239 may index the product images including the set of features in a k-dimensional tree data structure” in [0034], and “width, height, depth..etc.“ in [0069]-[0070]; also see Choi: e.g., --to estimate the orientation of the object by conducting subcategory classification in the framework, where the orientation of the subcategory is transferred to the detected object.  For validation on the KITTI dataset, the method used 173 subcategories (125 3DVPs for car, 24 poses for pedestrian and cyclist each), while for testing on the KITTI dataset, the method used 275 subcategories (227 3DVPs for car, 24 poses for pedestrian and cyclist each).  Correspondingly, the subcategory cony layer in the RPN 200 and the subcategory FC layer in the detection network 300 have an output number that equals to the number of subcategory plus one.--, in [0062]-[0064], also see in [0042]-[0043], “ (x, y, h)”;); 
		Kwon as modified by Choi however do not explicitly disclose dimensions of the three dimensional bounding box;
		Kutliroff teaches output dimensions of the three dimensional bounding box (see Kutliroff: e.g.,  --detection of an object in 3 dimensions and the subsequent removal of elements of the scene which are not part of the detected object….object boundary set of 3D pixels. However, if no match can be found, a new object boundary set….etc., -- in [0016]-[0017]; and, --3D segmentation is box measurement. A 3D image from a depth camera provides the distance between the camera and objects within the scene, which can be used to obtain measurements of the surface area or the volume of objects, such as boxes or cartons…the sampled points used to calculate the dimensions of the object do not belong to the surrounding environment.--, in [0001], {so that it is very clear that Kutliroff’s disclosures of 3D boundary set as the results of 3D segmentation is a three-dimensional bounding box, and receiving the dimensions of the object(s) are of the dimensions of  the three-dimensional bounding box, such that being used in obtaining the volume of objects,  such as boxes or cartons, it is apparently herein, if the object is a carton box, then above mentioned calculated the dimensions of the object would be height, width, and length of the box, enable to obtain the volume of the box}; the 3D bounding box as demonstrated in Fig. 3, wherein the bounding box of the coffee table which is the results of 3D segmentation is a 3D boundary set as the 3D bounding box, and see in Fig. 6, and, -- The object boundary set creation circuit 420 may be configured to create a new object boundary set … the associated 3D position of the 2D pixel is computed, by sampling the associated depth map to obtain the associated depth pixel and projecting that depth pixel to a point in 3D space, at operation 412. A ray is then generated which extends from the camera to the location of the projected point in 3D space at operation 414. The point at this 3D position is included in the object boundary set at operation 416.--, in [0032]),
		Kwon (as modified by Choi) and Kutliroff are combinable as they are in the same field of endeavor: to detect target objects from image processing techniques, particularly both using the convolutional neural network as the detection and classification tool. Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Kwon (as modified by Choi)’s system using Kutliroff’s teachings by including output dimensions of the three dimensional bounding box to Kwon( as modified by Choi)’s calculation of the dimensions of the object in order to obtain measurements, such as the surface area or the volume of objects (see Kutliroff: e.g. in in [0001], [0016]-[0017], and [0032]); and 
		Kwon as modified by Choi and Kutliroff further disclose the offset represents an orientation offset with respect to the coarse orientation of the three dimensional bounding box (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030]).

		Re Claim 5, Kwon as modified by Choi and Kutliroff further disclose wherein the orientation of the three dimensional bounding box is based at least in part on the coarse orientation and the orientation offset (see Choi: e.g., -- After generating RoIs, the RoI pooling layer is applied to pool cony features for each RoI.  Then the pooled cony features are used for two tasks: subcategory classification and bounding box regression….. which is computed by applying a softmax function over the K+1 outputs of the subcategory cony layer.  The second layer outputs bounding box regression offsets--, in [0042]-[0043]; and, -- After the region proposal process, CNNs are utilized to classify these proposals and refine their locations.  Since region proposal significantly reduces the search space in object detection (e.g., several thousand regions per image), more powerful CNNs can be used in the detection step, which usually contain several fully connected layers with high dimensions… two that output softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor. --, in [0050]-[0051], and also see: -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030]), the orientation represented as an angle between:
		a first ray originating from a center of the sensor and passing through a center of the two-dimensional bounding box (see Kwon: e.g., --a ROI in the 
query image may be indicated by a bounding box enclosing the image area it 
covers.  A ROI can be of any shape, for example, a polygon, a circle with a 
center point and a diameter, a rectangular shape of a width, a height and one 
or more reference points (e.g., a center point, one or more corner points) of 
the region--, in [0047], and see Kutliroff: e.g., -- Three parameters may describe the translation of the camera between consecutive frames (e.g., x, y and z). Three additional parameters may describe the change in orientation (e.g., yaw, pitch and roll angle) for a total of six degrees of freedom (6DOF) that are computed to determine the updated pose of the camera relative to its pose in the previous frame.--, in [0024]-[0026], and,  -- In order to represent this point in the object boundary set, two 3-element vectors are stored: the 3D (x,y,z) position of the point in the global coordinate system, and the vector representing the ray extending from the camera's position to that point (which is referred to herein as the "camera ray").--, in [0029]-[0031], also see Fig. 6, {herein above “camera ray” is considered as the orientation of the object Fig. 3, wherein the bounding box of the coffee table which is the results of 3D segmentation is a 3D boundary set as the 3D bounding box, also see in Fig. 6, 3D coordinates (x, y, z) are applied in order to define the “locations”; see the corresponding discussion: e.g., --FIG. 6 illustrates an example position and ray associated with pixels in an object boundary set-, in [0009], and, -to represent this point in the object boundary set, two 3-element vectors are stored: the 3D (x,y,z) position of the point in the global coordinate system, and the vector representing the ray extending from the camera's position to that point (which is referred to herein as the "camera ray").-, in [0032]), and
		a second ray aligned with a direction of the object (see Kwon: e.g., -- the region detector 205 may extract regions of interest (ROIs) from the first image.  As described above, the region detector 205 may detect the ROIs in the first image using model-based features, alignment with planogram to localize products--, in [0054], and [0086], and also see Kutliroff: -- The 3D registration circuit 138 may be configured to operate on each of the detected objects in the scene, along with the associated segmented region, to obtain a 3D alignment of the object in the scene.--, in[ 0027], and, -- to improve the estimated boundary of the detected objects by removing, from each object boundary set, duplicate pixels and pixels associated with other objects occluding the camera's view of the object of interest. At operation 1602, each point P in the object boundary set is considered, and a ray is projected, at operation 1604, from the point P in the direction of the camera ray that was previously stored for that point. This ray is referred to herein as the "point ray." Next, at operation 1606, all points in the object boundary set are analyzed with respect to the current point P, and any point lying close enough to the point ray (within a given threshold distance) is selected. This set of selected points is referred to as "set A," and the points in set A are considered to be lying on the point ray.--, in [0065]).

		Re Claim 6, Kwon as modified by Choi and Kutliroff further disclose associating, as an association, the three dimensional bounding box with the sensor data; and estimating a position of the three dimensional bounding box based at least in part on the association (see Choi: e.g., --These cony filters operate on the extrapolated cony feature maps, and output heat maps that indicate the confidences of the presence of objects in the input image.--, in [0038]-[0043]; also see: --The subcategory cony layer 212 outputs heat map 214 for each scale, where each location in the heat map 214 indicates the confidence of an object in the corresponding location, scale, and subcategory.  Using the subcategory heat maps 214, the RoI generating layer 216 is designed that generates object candidates (RoIs) 218 by thresholding the heat maps.  The RoIs 218 are used in the RoI pooling layer 220 to pool cony features from the extrapolated cony feature maps 222.  Finally, the RPN 200 terminates at two sibling layers 226, one that outputs softmax probability estimates over object subcategories, and another layer that refines the RoI location with a bounding box regressor.--, in [0034]; similarly, also see Kwon: e.g., --The ranking module 211 may then return the product associated with the result class as recognized product for the ROI in the query image and the classification score corresponding to that result class as confidence score of the product recognition.--, in [0073]-[0075], [0077], also see in [0034], and [0071] see Kutliroff: e.g.,  --detection of an object in 3 dimensions and the subsequent removal of elements of the scene which are not part of the detected object….object boundary set of 3D pixels. However, if no match can be found, a new object boundary set….etc., -- in [0016]-[0017]; and, --3D segmentation is box measurement. A 3D image from a depth camera provides the distance between the camera and objects within the scene, which can be used to obtain measurements of the surface area or the volume of objects, such as boxes or cartons…the sampled points used to calculate the dimensions of the object do not belong to the surrounding environment.--, in [0001], {so that it is very clear that Kutliroff’s disclosures of 3D boundary set as the results of 3D segmentation is a three-dimensional bounding box, and receiving the dimensions of the object(s) are of the dimensions of  the three-dimensional bounding box, such that being used in obtaining the volume of objects,  such as boxes or cartons, it is apparently herein, if the object is a carton box, then above mentioned calculated the dimensions of the object would be height, width, and length of the box, enable to obtain the volume of the box}; the 3D bounding box as demonstrated in Fig. 3, wherein the bounding box of the coffee table which is the results of 3D segmentation is a 3D boundary set as the 3D bounding box, and see in Fig. 6, and, -- The object boundary set creation circuit 420 may be configured to create a new object boundary set … the associated 3D position of the 2D pixel is computed, by sampling the associated depth map to obtain the associated depth pixel and projecting that depth pixel to a point in 3D space, at operation 412. A ray is then generated which extends from the camera to the location of the projected point in 3D space at operation 414. The point at this 3D position is included in the object boundary set at operation 416.--, in [0032]).

		Re Claim 7, Kwon as modified by Choi and Kutliroff further disclose estimating the position of the three dimensional bounding box in the environment comprises minimizing a difference between an association of the three dimensional bounding box with the image data and the two dimensional bounding box (see Choi: e.g., -- Then these region proposals are fed into a CNN for classification and location refinement.  The exemplary embodiments of the present invention adopt the two-stage detection framework, where the region proposal process can be considered to be the coarse detection step in a coarse-to-fine detection method--, in [0030]; also see Kwon: e.g., -- the determined location may be an absolute position of the object with its x-y coordinates in the query image.  In some embodiments, the determined location may be a relative location of the object, for example, a relative distance(s) from the object to one or more points of reference (e.g., a light source, a sign, a bottom shelf of the shelving unit, other packaged products appear in the scene, etc.).  In some embodiments, the region segmentation module 311 may determine the image area covered by the located object in the query image as a detected ROI.  The detected ROI may be represented by a bounding box surrounding the located object and may be identified by a location (absolute location, e.g., x-y coordinates, or relative location) of the bounding box in the query image.--, in [0054]-[0056], and [0085]; further see Choi: e.g., -- according to the ground truth bounding boxes of objects in a training image, the intersection over union (IoU) overlap is computed between the generated boxes and the ground truth boxes.  Bounding boxes with IoU overlap larger/smaller than some threshold (e.g., 0.5) are considered to be positive/negative.  Finally, given the number of RoIs to be generated for each training image R (i.e., batch size divided by the number of images in a batch), the RoI generating layer outputs R.times..alpha.  hard positives (i.e., R.times..alpha.  positive bounding boxes with lowest scores in the heat maps--, in [0041]).

		Re Claim 8, Kwon as modified by Choi and Kutliroff further disclose the machine learning algorithm is a neural network trained based at least in part on training data comprising a training two dimensional bounding box and an associated ground truth three dimensional bounding box (see Kwon: e.g., --the CNN classification module 207 can be trained to extract features and recognize products at coarse-grained level (e.g., raw categorization of products)--, in [0059], and,  --the CNN classification module 207 may classify ROIs in the query image into category classes when coarse product categorization of the query image is required.--, in [0061]; and, --and fine-grained level (e.g., refined categorization of products, discrimination of similar products from the same brand or category).--, in [0059], and [0068]; and also see Choi: e.g., -- according to the ground truth bounding boxes of objects in a training image, the intersection over union (IoU) overlap is computed between the generated boxes and the ground truth boxes.  Bounding boxes with IoU overlap larger/smaller than some threshold (e.g., 0.5) are considered to be positive/negative.  Finally, given the number of RoIs to be generated for each training image R (i.e., batch size divided by the number of images in a batch), the RoI generating layer outputs R.times..alpha.  hard positives (i.e., R.times..alpha.  positive bounding boxes with lowest scores in the heat maps--, in [0041]).

		Re Claim 9, Kwon as modified by Choi and Kutliroff further disclose the training data is based at least in part on a transformation to a training image (see Kutliroff: e.g., -- The calculated pose of the camera is the 3D transformation from the position and orientation of the camera in a previous frame, to its position and orientation in the current frame.  Three parameters may describe the translation of the camera between consecutive frames (e.g., x, y and z).  Three additional parameters may describe the change in orientation (e.g., yaw, pitch and roll angle) for a total of six degrees of freedom (6DOF) that are computed to determine the updated pose of the camera relative to its pose in the previous frame.--, in [0030]);
		and the transformation comprises at least one of: mirroring the training image; adding noise to the training image; resizing the training image; or resizing the training two dimensional bounding box (see Kutliroff: e.g., --The calculated pose of the camera is the 3D transformation from the position and orientation of the camera in a previous frame, to its position and orientation in the current frame.  Three parameters may describe the translation of the camera between consecutive frames (e.g., x, y and z).  Three additional parameters may describe the change in orientation (e.g., yaw, pitch and roll angle) for a total of six degrees of freedom (6DOF) that are computed to determine the updated pose of the camera relative to its pose in the previous frame.--, in [0030]; and, -- Consequently, 3D points extracted from the associated depth maps can also be transformed or projected to this coordinate system.  Thus, computation of the camera pose for each frame allows for integration of the depth maps obtained at different times into a single 3D space.--, in [0033], [0041], and, -- In some embodiments, the RGB image frames are stored and mapped to the 3D reconstruction, enabling the use of 2D feature detection techniques such as Scale Invariant Feature Transform (SIFT) detection and Speeded-Up Robust Feature (SURF) detection.--, in [0042]. [0100], and [0108]; similarly also see Kwon: e.g., -- The two sets of features in the query image and in the candidate index image are geometrically consistent if they have the same shape, e.g., one set of features can be transformed to the other set by one or more operations including translation, rotation, scaling, etc.--, in [0053]).

		Re Claims 12-15, claims 12-15 are the corresponding method claim to claims 4-7 respectively.  Claims 12-15 thus are rejected for the similar reasons for claims 4-7. See above discussions with regard to claims 4-7 respectively. Further, Kwon as modified by Choi and Kutliroff further disclose method for estimating a three dimensional bounding box of an object in a two dimensional image (see Kurliroff: e.g., --a new object boundary set is created which contains 3D projections of pixels contained in the 2D bounding box.  The 3D pixels in the object boundary set are also paired with a vector that describes the perspective of the camera associated with the capture of that pixel.--, in [0017]).

		Re Claims 18-21, claims 18-21 are the corresponding medium claim to claims 4-7 respectively.  Claims 18-21 thus are rejected for the similar reasons for claims 4-7. See above discussions with regard to claims 4-7 respectively. Further, Kwon as modified by Choi and Kutliroff further disclose non-transitory computer readable medium comprising instructions that, when executed by one or more processors, cause the one or more processors to perform operations (see Kwon: e.g., Fig. 1, and in [0023]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: Kutliroff (US 20170243352 A1) teaches creating a 3D reconstruction of the scene based on depth pixels that are projected and accumulated into a global coordinate system. BOULKENAFED (US 20170161590 A1) discloses that  the number of 2D images is in general more than ten; e.g. around a hundred images may be generated for each 3D model. The exact number depends on the size of each object (i.e., on the bounding box or any other bounding volume of the 3D modeled object) (see Figs, 2-3, and in [0066]-[0067]).

Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEIWEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on Monday-Friday 8:30am-4:30pm east.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/WEI WEN YANG/Primary Examiner, Art Unit 2667