DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claim(s) 1, 2, 10-13 are rejected under 35 U.S.C. 102 (a) (1) as being anticipated by Xiaozhi Chen et al. “Multi-view 3D object detection network for autonomous Driving” (hereafter Chen) (see IDS).
Regarding claim 1, Chen discloses a method of detecting and tracking a vehicle near a host vehicle (abstract: autonomous driving scenario; item "1. Introduction", paragraphs 1-2; item "4. Experiments", paragraph 1: "car category", figure 6), the method comprising: 
receiving, by an electronic controller, an input image from a camera mounted on the host vehicle (item "1. Introduction", paragraphs 1-3 discloses 3D object detection plays an important role in the visual perception system of Autonomous driving cars. Modern self-driving cars are commonly equipped with multiple sensors, such as LIDAR and cameras); and applying, by the 
Regarding claim 2, Chen further discloses displaying an output image on a display screen, the output image including at least a portion of the input image and an indication of the three-dimensional bounding box overlaid onto the input image (figure 6, last column).
 	Regarding claim 10, Chen further discloses wherein the definition of the three-dimensional bounding box includes a set of six structured points defining four corners of the first quadrilateral 
 	Regarding claim 11, Chen further discloses the corners are defined in a three-dimensional coordinate system (item "3.3. Region-based Fusion Network", paragraph "Oriented 3D Box Regression").
 	Regarding claims 12 and 13, Chen further discloses the corners are defined in a two-dimensional coordinate system of the input image (figure 6, last column: since the 3d bounding box is overlaid on the image, it is implicit that the corners had to be also computed in the 2d coordinate of the image).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The text of those sections of Title 35, U.S. Code not included in this action can be found in a prior Office action.
The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or non-obviousness.
Claims 3, 4 and 5 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Dim Papadopoulos “ Training object class detectors using only human verification” (hereafter Dim).
 	Regarding claim 3, Guan does not explicitly disclose the method, further comprising: 
receiving a user input indicative of a selection of a location on the output image within the three-dimensional bounding box; determining, based on the user input, that the three-dimensional bounding box is indicative of an incorrect detection of a vehicle by the neural network; and 
retraining the neural network based on the incorrect detection. However, in same field of endeavor, Dim teaches in Fig. 1, the re-training object detectors, and re-localization objects and verification by annotators and feedback to the re-training object detectors. Also discloses on page 1, right column, In this paper we propose a new scheme for learning object detectors which only requires humans to verify bounding-boxes produced automatically by the learning algorithm: the annotator merely needs to decide whether a bounding-box is correct or not. Crucially, answering this verification question takes much less time than actually drawing the bounding-box. Given a set of training images with image-level labels, our scheme iteratively alternates between updating object detectors, re-localizing objects in the training images, and querying humans for verification. At each iteration we use the verification signal in two ways. First, we update the object class detector using only positively verified bounding boxes. This makes it stronger than when using all detected bounding-boxes, as it is commonly done in the weakly supervised setting, because typically many of them are incorrect. Moreover, once the object location in an image has been positively verified, it can be fixed and removed from consideration in subsequent iterations. Second, we observe that bounding-boxes judged as incorrect still provide valuable information about where the object is not. Building on this observation, we use the negatively verified bounding boxes to reduce the search space of possible object locations in subsequent iterations. Both these points help to rapidly find more objects in the remaining images. This results in a framework for training object detectors which minimizes human annotation effort and eliminates the need to draw any bounding-box. Therefore, it would have been obvious to one of ordinary skilled in the art before the effective filing date of the claimed invention to combine the teachings of Dim with Chen, as a whole, so as to re training the object detectors based on verification by human annotators, the motivation is to refine the training process or to improve the training. 
	Regarding claim 4, the combined teachings further discloses the method, further comprising: receiving a user input indicative of a selection of a location on the output image outside of the three-dimensional bounding box; determining, based on the user input, that an undetected vehicle is present within the field of the input image at the location corresponding to the user input; and retraining the neural network based on the user input indicative of the undetected vehicle (see, Dim, teaches in Fig. 1, the re-training object detectors, and re-localization objects and verification by annotators and feedback to the re-training object detectors. Also discloses on page 1, right column, In this paper we propose a new scheme for learning object detectors which only requires humans to verify bounding-boxes produced automatically by the learning algorithm: the annotator merely needs to decide whether a bounding-box is correct or not. Crucially, answering this verification question takes much less time than actually drawing the bounding-box. Given a set of training images with image-level labels, our scheme iteratively alternates between updating object detectors, re-localizing objects in the training images, and querying humans for verification. At each First, we update the object class detector using only positively verified bounding boxes. This makes it stronger than when using all detected bounding-boxes, as it is commonly done in the weakly supervised setting, because typically many of them are incorrect. Moreover, once the object location in an image has been positively verified, it can be fixed and removed from consideration in subsequent iterations. Second, we observe that bounding-boxes judged as incorrect still provide valuable information about where the object is not. Building on this observation, we use the negatively verified bounding boxes to reduce the search space of possible object locations in subsequent iterations. Both these points help to rapidly find more objects in the remaining images. This results in a framework for training object detectors which minimizes human annotation effort and eliminates the need to draw any bounding-box. Furthermore, Dim teaches in 5. Experimental results, Compared methods. We compare our approach to the fully supervised alternative by training the same object detector (sec. 4.1) on the same training images, but with manual bounding-boxes (again, one bounding-box per class per image). On the other end of the supervision spectrum, we also compare to a modern MIL-based WSOL technique (sec 4.2) run on the same training images, but without human verification. Since that technique also forms the initialization step of our method, this comparison reveals how much farther we can go with human verification, clearly, here manually adding or annotation of the bounding boxes is one option to find the undetected objects which during training process not detected).
	Regarding claim 5, the combined teachings further discloses the method of claim , further comprising: applying a vehicle detection routine to the input image at the location corresponding to the user input (see, Fig. 6, the object detectors and re-localizing objects); and defining a second three-dimensional bounding box based on the vehicle detection routine (see, Fig. 6, multiple bounding boxes and based on verification, retrain the object detectors), wherein retraining the neural network includes retraining the neural network to output the definition of the second three-.

10.	Claims 6 is rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Dim Papadopoulos “Training object class detectors using only human verification” (hereafter Dim) and further in view of Price et al. (US 2018/0108137)(hereafter Price).
	Regarding claim 6, the combined teachings do not discloses the method, further comprising: prompting a user to manually position a second three-dimensional bounding box relative to the undetected vehicle in the input image; and defining a second three-dimensional bounding box based on a second user input received in response to the prompt, wherein retraining the neural network includes retraining the neural network to output the definition of the second three-dimensional bounding box based at least in part on the input image. However, in same field of endeavor, Price teaches in paragraph [0008], a processing device receives a digital visual medium having a first bounding box corresponding to an object within a digital visual medium.  The processing device, based on the first bounding box, generates a set of additional bounding boxes corresponding to the object within the digital visual medium.  The first bounding box and the additional bounding boxes, in combination, form a bounding box set.  The processing also generates a set of distance maps corresponding to the bounding box set.  The processing device concatenates the digital visual medium with each distance map in the set of distance maps to generate a set of training pairs.  A neural network is trained to segment pixels of the digital visual medium corresponding to the object based on the training pairs. Paragraph [0041] teaches a ground-truth image is received that includes a bounding box corresponding to a target object within the ground-truth image.  In some aspects, the ground-truth image is received by a neural network.  For example, the segmentation engine 103 of the creative apparatus 108 employs a neural network to receive a visual medium input and to generate a mask representing pixels of a target object within . 

Claims 7-9, 14 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Chen in view of Guan et al. (US 2013/0322691)(hereafter Guan).	
Regarding claim 7, Chen does not explicitly disclose automatically operating a vehicle system to control a movement of the host vehicle based at least in part on the detected bounding boxes, however, in same field of endeavor, Guan teaches ([0030]-[0031]) In addition, the calculation result of the image analysis unit 102 is transmitted to the vehicle drive control unit 104.  The vehicle drive control unit 104 performs driving support control to report the alert and control the steering and brakes of the vehicle 100, based on the detection of the recognition target such as another vehicle and pedestrian. The vehicle drive control unit 104 realizes various functions having a brake-control-and-alert function that the driver is alerted to take corrective action to avoid a collision or reduce the impact of the collision, and a driving speed adjustment function to maintain a safe minimum distance between vehicles by engaging a control device such as the brakes and the steering. [0032] FIG. 2 is a diagram illustrating a configuration of the imaging unit 101 and the image analysis unit 102.  The imaging unit 101 is a stereo camera system that includes two cameras 110A and 110B, and the two cameras 110A and 110B have similar configuration.  Respective cameras 110A and 110B include capturing lenses 111A and 111B, optical filters 112A and 112B, and image sensors 113A and 113B on which image pickup elements are two-dimensionally arranged.  The imaging unit 101 outputs luminance data. Therefore, it would have been obvious to 
Regarding claim 8, the combined teachings as a whole further discloses determining a distance between the host vehicle and the detected vehicle based at least in part on the detected bounding box, and wherein automatically operating a vehicle system includes automatically adjusting a speed of the host vehicle based at least in part on the determined distance between the host vehicle and the detected vehicle (Guan [0031] In addition, the calculation result of the image analysis unit 102 is transmitted to the vehicle drive control unit 104.  The vehicle drive control unit 104 performs driving support control to report the alert and control the steering and brakes of the vehicle 100, based on the detection of the recognition target such as another vehicle and pedestrian.  The vehicle drive control unit 104 realizes various functions having a brake-control-and-alert function that the driver is alerted to take corrective action to avoid a collision or reduce the impact of the collision, and a driving speed adjustment function to maintain a safe minimum distance between vehicles by engaging a control device such as the brakes and the steering).
Regarding claim 9, the combined teachings as a whole further discloses automatically operating a vehicle system includes automatically adjusting, based at least in part on a position of the detected bounding box, at least one selected from a group consisting of a steering of the host vehicle, an acceleration of the host vehicle, a braking of the host vehicle, a defined trajectory of the host vehicle, and a signaling of the host vehicle (Guan, [0031], In addition, the calculation result of the image analysis unit 102 is transmitted to the vehicle drive control unit 104.  The vehicle drive control unit 104 performs driving support control to report the alert and control the steering and brakes of the vehicle 100, based on the detection of the recognition target such as another vehicle and pedestrian.  The vehicle drive control unit 104 realizes various functions having a brake-control-and-alert function that the driver is alerted to take corrective action to avoid a collision or 
Regarding claim 14, the combined teachings as a whole further discloses also discloses wherein the neural network is further configured to output definitions of a plurality of three-dimensional bounding boxes based at least in part on the input image, wherein each three-dimensional bounding box of the plurality of three-dimensional bounding boxes indicates a size and a position of a different one of a plurality of vehicles detected in the field of view of the input image (Guan, [0030] In FIG. 1, the in-vehicle control system 106 includes the imaging unit 101, an image analysis unit 102, and a vehicle drive control unit 104.  The imaging unit 101 is provided as a capture device to capture an image of the area in front of the vehicle 100 in the direction of travel.  For example, the imaging unit 101 is provided near a rearview mirror near a windscreen 103 of the vehicle 100.  The various data, such as, captured data acquired by the imaging unit 101 is input to the image analysis unit 102 as an image processor.  The image analysis unit 102 analyzes the data transmitted from the imaging unit 101, calculates the position, the direction, and the distance of another vehicle in front of the vehicle 100, and detects dividing lines as the lane borders.  When another vehicle (leading vehicle, oncoming vehicle) is detected, the vehicle is detected as the recognition target on the road based on the luminance image, the recognition process in the target recognition processor 206 is described below.  Initially, the target recognition processor 206 calculates a parallax average of the candidate set of recognition target areas.  The parallax average is the value obtained by adding the parallaxes at certain pixels and then dividing the sum, or obtained by using a central value (median) filter in the candidate set of recognition target areas.  Alternatively, the parallax may be set to the highest frequency value within the recognition target candidate areas.  The target recognition processor 206 calculates the distance between the candidate set of recognition target areas and the stereo camera system 101 based on the parallax  
Regarding claim 15, Chen discloses: a vehicle detection system (figure 1), the system comprising: a camera positioned on a host vehicle (item "1. Introduction", paragraph 1); a display screen (figure 6); 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to DHAVAL V PATEL whose telephone number is (571)270-1818.  The examiner can normally be reached on Monday to Friday (8:00am-4:30pm).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/DHAVAL V PATEL/Primary Examiner, Art Unit 2631                                                                                                                                                                                                        4/26/2021