DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 21-23, 26-30, 33-37, 40 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kutliroff (US 2017/0228940 A1).
Regarding claim 21, Kutliroff teaches:
A computer-implemented method, the method comprising: 
maintaining object data specifying objects that have been recognized in a scene in an environment; (Fig. 4, step 416, store points of objects boundary sets.)
receiving a stream of input images of the scene; ([0044], “As illustrated in FIG. 10, in one embodiment, method 1000 for segmentation of objects in a 3D image of a scene commences by receiving, at operation 1010, a series of 3D image frames, from a depth camera, of a scene containing one or more objects as the camera scans the scene. Each frame may thus provide a new view of the scene from a different perspective or camera pose. Each frame provided by the depth camera may include a color image frame comprising color (RGB) pixels and a depth map frame comprising depth pixels.” Also FIG. 10, [0044])
for each of a plurality of input images in the stream of input images: 
providing the input image as input to an object recognition system;  receiving, as output from the object recognition system, a recognition output that identifies a respective bounding box in the input image for each of one or more objects that have been recognized in the input image; ([0029], “The object detection circuit 408 may be configured to process the RGB image, and in some embodiments the associated depth map as well, to generate a list of any objects of interest recognized in the image. A label may be attached to each of the recognized objects and a 2D bounding box is generated which contains the object.” [0030], “Any suitable object detection technique may be used in to recognize the objects in the scene, and compute their locations in the image including, for example, template matching or classification using a bag-of-words vision model. In some embodiments, deep learning methods, and, in particular, convolutional neural networks are employed by the detection circuit 408. Some neural network methods process an image as input and calculate a probability that a given object is present in the image.” FIG. 4, FIG. 5 shows the identified 2D bounding boxes. Also FIG. 10, [0045])
providing data identifying the bounding boxes as input to a three-dimensional (3- D) bounding box generation system that determines, from the object data and the bounding boxes, a respective 3-D bounding box for each of one or more of the objects that have been recognized in the input image; ([0045] and FIG. 10 teaches finding the object 3D boundary set includes 3D positions of the pixels in the 2D bounding box along with an associated vector for each pixels.  “At operation 1040, a 2D bounding box is calculated which contains the detected object, and a 3D location corresponding to the center of the bounding box is also calculated. At operation 1050, an attempt is made to match the detected object to an existing object boundary set. The matching is based on the label and the 3D center location of the bounding box. At operation 1060, if the match fails, a new object boundary set is created for the detected object. The object boundary set includes 3D positions of the pixels in the 2D bounding box along with an associated vector for each pixel.”) and 
and providing, as output, data specifying the one or more 3-D bounding boxes. ([0042], “The present disclosure describes a technique for 3D segmentation which, in some embodiments, can be implemented in an interactive manner. For example, the segmentation results may be displayed and updated on display element 112 so that a user operating the camera can continue to refine and improve the quality of the segmentation by moving around the object of interest and continuing to scan it with the depth camera until the segmentation results meet expectations.”)
However, Kutliroff does not explicitly teach:
receiving, as output from the 3-D bounding box generation system, data specifying one or more 3-D bounding boxes for one or more of the objects recognized in the input image;
However, Kutliroff teaches in FIG. 11, that the implementation system has a connection between the 3D Segmentation System and the Display Element by the I/O System. The 3D Segmentation System generates 3D segmentation (3D bounding boxes output) results; The display Element displays the bounding boxes output. 
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have combined the different parts of teachings of Kutliroff and allow the I/O system to receive the generated bounding boxes information from the 3D Segmented System and pass them to the Display Element for displaying to produce expected results.

Regarding claim 22, Kutliroff teaches:
The method of claim 21, wherein the 3-D bounding box generation system comprises: a multi-view fusion system that generates an initial set of 3-D bounding boxes; ([0044]-[0045] teaches the views captured from different perspective are used to generate the initial 3D bounding boxes: “As illustrated in FIG. 10, in one embodiment, method 1000 for segmentation of objects in a 3D image of a scene commences by receiving, at operation 1010, a series of 3D image frames, from a depth camera, of a scene containing one or more objects as the camera scans the scene. Each frame may thus provide a new view of the scene from a different perspective or camera pose. Each frame provided by the depth camera may include a color image frame comprising color (RGB) pixels and a depth map frame comprising depth pixels. Next, at operation 1020, one or more objects are detected in each frame using object recognition techniques, and at operation 1030, a label is associated with each detected object. At operation 1040, a 2D bounding box is calculated which contains the detected object, and a 3D location corresponding to the center of the bounding box is also calculated. At operation 1050, an attempt is made to match the detected object to an existing object boundary set. The matching is based on the label and the 3D center location of the bounding box. At operation 1060, if the match fails, a new object boundary set is created for the detected object. The object boundary set includes 3D positions of the pixels in the 2D bounding box along with an associated vector for each pixel. The vector specifies a ray, or direction, from the position of the depth camera associated with the current camera pose, to the pixel.”) and a bounding box refinement system that refines the initial set of 3-D bounding boxes to the one or more 3-D bounding boxes.([0046] “Of course, in some embodiments, additional operations may be performed, as previously described in connection with the system. These additional operations may include, for example, adjusting the object boundary set to remove duplicate pixels generated from different poses of the depth camera, based on the distance of the pixels from the camera and further based on the direction of the associated vectors. The adjustment may also remove pixels associated with an occluding object. Further additional operations may include, for example, detecting surface planes upon which the objects may be positioned and removing pixels associated with those planes from the object boundary set.”)

Regarding claim 23, Kutliroff teaches:
The method of claim 22, wherein the object recognition system, the multi-view fusion system, and the bounding box refinement system operate in a stateless manner and independently from one another.([0044]-[0045] teaches the views captured from different perspective are used to generate the initial 3D bounding boxes. [0046] teaches the refinement operation. The two module works independently, without keeping the state of the other module. The only communication is by input and output. )

Regarding claim 26, Kutliroff teaches:
The method of claim 21, wherein the object recognition system comprises a trained deep neural network (DNN) model that takes the input image and generates a respective two-dimensional (2-D) object bounding box for each of the one or more objects that have been recognized in the input image.(“ [0030] Any suitable object detection technique may be used in to recognize the objects in the scene, and compute their locations in the image including, for example, template matching or classification using a bag-of-words vision model. In some embodiments, deep learning methods, and, in particular, convolutional neural networks are employed by the detection circuit 408. Some neural network methods process an image as input and calculate a probability that a given object is present in the image. Determination of the location of the object in the image may be accomplished using sliding windows that can be applied progressively over the image, cropping smaller regions of the image and applying the network to each window. Other techniques for object location first filter out and reject those windows that are unlikely to contain objects.”)

Regarding claim 27, Kutliroff teaches:
The method of claim 21, wherein the stream of input images of the scene are captured from two or more user devices. ([0044], “As illustrated in FIG. 10, in one embodiment, method 1000 for segmentation of objects in a 3D image of a scene commences by receiving, at operation 1010, a series of 3D image frames, from a depth camera, of a scene containing one or more objects as the camera scans the scene. Each frame may thus provide a new view of the scene from a different perspective or camera pose. Each frame provided by the depth camera may include a color image frame comprising color (RGB) pixels and a depth map frame comprising depth pixels.”)

Regarding claim 28, Kutliroff teaches:
A system comprising one or more computers and one or more storage devices storing instructions that when executed by the one or more computers cause the one or more computers (FIG. 11) to perform operations comprising: the rest of claim 28 recites similar limitations of claim 21, thus are rejected using the same rejection rationale.

Regarding claim 35, Kutliroff teaches:
A computer program product encoded on one or more non-transitory computer readable media, the computer program product comprising instructions that when executed by one or more computers cause the one or more computers ([0061], “The various embodiments disclosed herein can be implemented in various forms of hardware, software, firmware, and/or special purpose processors. For example in one embodiment at least one non-transitory computer readable storage medium has instructions encoded thereon that, when executed by one or more processors,”) to perform operations comprising: the rest of claim 28 recites similar limitations of claim 21, thus are rejected using the same rejection rationale.

Claims 29-30, 33-34 recite similar limitations of claim 22-23, 26-27 respectively, thus are rejected using the same rejection rationale respectively.
Claims 36-37, 40 recite similar limitations of claim 22-23, 26 respectively, thus are rejected using the same rejection rationale respectively.

Claim(s) 24-25, 31-32, 38-39 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kutliroff in view of Rubino et al. (“3D Object Localisation from Multi-View Image Detections” from IDS).
Regarding claim 24, Kutliroff teaches:
The method of claim 22, wherein the maintained object data …that is generated from a plurality of two-dimensional (2-D) bounding boxes of each object that have been recognized in the scene.([0033], “When the scanning of the scene is completed, there is a single object boundary set for each object detected in the scene. Each object boundary set contains the aggregate of all points projected from pixels in the 2D bounding box of the object, as captured from multiple camera perspectives.” [0032], “The point at this 3D position is included in the object boundary set at operation 416. In order to represent this point in the object boundary set, two 3-element vectors are stored: the 3D (x,y,z) position of the point in the global coordinate system, and the vector representing the ray extending from the camera's position to that point (which is referred to herein as the “camera ray”).”)
However, Kutliroff does not, but Rubino teaches:
comprises an ellipsoid that is generated from a plurality of two-dimensional (2-D) bounding boxes of each object that have been recognized in the scene (Abstract: “—In this work we present a novel approach to recover objects 3D position and occupancy in a generic scene using only 2D object detections from multiple view images. The method reformulates the problem as the estimation of a quadric (ellipsoid) in 3D given a set of 2D ellipses fitted to the object detection bounding boxes in multiple views.”)
Kutliroff teaches a multiple -view fusion method when identifying 3D bounding boxes based on 2D bounding boxes of each object that have been recognized in the scene. Rubino teaches the same thing using an approach based on ellipsoid.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the multiple -view fusion method of Kutliroff by the multiple -view fusion method based on ellipsoid of Rubino to generate more accurate results.(Rubino, page 1282, left.)

Regarding claim 25, Kutliroff teaches:
The method of claim 24, wherein the multi-view fusion system generates the initial set of 3-D bounding boxes by performing at least the following steps: 
for each 2-D bounding box identified in the input image, determining whether the 2-D bounding box identified in the input image is associated with one or more 2-D bounding boxes of an object that has been recognized in the maintained object data; ([0031] “The object boundary set matching circuit 410 may be configured to find an appropriate existing object boundary set that matches each of the detected objects, if possible. The matching is based on a comparison of the object label and/or the 3D location of the center of the 2D bounding box, between the detected object and each of the existing object boundary sets, if any.”)
in response to determining that the 2-D bounding box identified in the input image is associated with one or more 2-D bounding boxes of an object that has been recognized, updating the maintained object data by calculating an updated bounding box of the object using the 2- D bounding box identified in the input image; (FIG. 4, step 416, [0032] “The object boundary set creation circuit 420 may be configured to create a new object boundary set if a suitable match for the detected object is not found by the object boundary set matching circuit 410. For each unmatched detected object of interest, the 2D bounding box containing the object is scanned to analyze each pixel within the bounding box. For each pixel, the associated 3D position of the 2D pixel is computed, by sampling the associated depth map to obtain the associated depth pixel and projecting that depth pixel to a point in 3D space, at operation 412. A ray is then generated which extends from the camera to the location of the projected point in 3D space at operation 414. The point at this 3D position is included in the object boundary set at operation 416.”)
in response to determining that the 2-D bounding box identified in the input image is not associated with any objects that have been recognized, creating a new object by generating an bounding box from at least the 2-D bounding box identified in the input image; ([0032] “The object boundary set creation circuit 420 may be configured to create a new object boundary set if a suitable match for the detected object is not found by the object boundary set matching circuit 410. For each unmatched detected object of interest, the 2D bounding box containing the object is scanned to analyze each pixel within the bounding box.”)and
However, Kutliroff does not, but Rubino teaches:
the bounding box can be ellipsoid of the object (page 1283, left: “Our goal is to estimate the position and space occupancy of each object in the 3D scene given the 2D bounding boxes and by using multi-view constraints. In order to ease the mathematical formalisation of the problem, we move from a bounding box representation of an object to an ellipsoid one.”)
generating the initial set of 3-D bounding boxes using the ellipsoids of the objects that have been recognized in the input image.(“

    PNG
    media_image1.png
    433
    689
    media_image1.png
    Greyscale
”)
Kutliroff teaches a multiple -view fusion method when identifying 3D bounding boxes based on 2D bounding boxes of each object that have been recognized in the scene. Rubino teaches the same thing using an approach based on ellipsoid.
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to have replaced the multiple -view fusion method of Kutliroff by the multiple -view fusion method based on ellipsoid of Rubino to generate more accurate results.(Rubino, page 1282, left.)

Claims 31-32 recite similar limitations of claim 24-25 respectively, thus are rejected using the same rejection rationale respectively.
Claims 38-39 recite similar limitations of claim 24-25 respectively, thus are rejected using the same rejection rationale respectively.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to YANNA WU whose telephone number is (571)270-0725. The examiner can normally be reached Monday-Thursday 8:00-5:30 ET.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/YANNA WU/Primary Examiner, Art Unit 2611