DETAILED ACTION
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7-8, 12-14, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Xu (WO 2019060125 A1), and in view of Mao (US 20200202145 A1).
Re Claim 1, Xu discloses an image processing method (see Xu: e.g., Fig. 2, -- a first processing algorithm 212 configured to receive the point cloud 204. In some implementations of this disclosure, the first processing algorithm 212 may include an artificial neural network (e.g., a convolutional neural network) configured to receive the point cloud and analyze the points. For example, the first processing algorithm 212 may be a PointNet network. PointNet is a deep network architecture that receives raw point cloud data and learns both global and local point features. PointNet has been used conventionally for classification, part segmentation, and semantic segmentation purposes. For purposes of this disclosure, however, the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206.--, in [0020]-[0028]); 
obtaining an image from a camera located on a vehicle while the vehicle is being driven (see Xu: e.g., -- [0010] Autonomous vehicle systems may include an array of different types of sensors to detect, track and identify objects and/or attributes of objects. For instance, sensors, such as LIDAR and RADAR, ultrasonic transducers, depth cameras, and the like can provide three- dimensional information about objects in an environment and sensors such as conventional cameras can provide two-dimensional information about the environment. For instance, a LIDAR system may have a light emitter and a light sensor, with the light emitter including one or more lasers that direct highly focused light toward an object or surface which reflects the light back to the light sensor…. image capture devices may provide 2D image data, such as RGB image data, greyscale image data, or otherwise, about the environment.--, in [0010]);
cropping a portion of the obtained image corresponding to a region of interest identified based on one or more positions of a first set of one or more reference points selected on the obtained image (see Xu: e.g., Fig. 2, an, -- a machine learning algorithm is applied to the image data and the point cloud data, to estimate parameters for a three- dimensional bounding box associated with one or more objects in the environment. For instance, a first feature vector associated with the image data, for example, associated with a cropped image corresponding to the object of interest, and a second feature vector associated with the point cloud data may be input to the machine learning algorithm. The machine learning algorithm may output parameters of the three-dimensional bounding box. The parameters may include eight points in a coordinate system, the eight points representing the eight comers of the three-dimensional bounding box. An example machine learning algorithm used to recover the parameters is an artificial neural network (ANN), which may be a Convolutional Neural Network (CNN).--, in [0012]; and, -- the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206….. The second processing algorithm 216 may be configured to receive the vehicle image 210 and produce one or more appearance feature vectors 218 associated with the vehicle image 210. --, in [0022]-[0023];  also see: -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]);
detecting an object in the cropped portion of the image (see Xu: e.g., -- receive an image captured from an image capture device; detect an object in the image; crop the image to form a cropped image including the object--, in [0091]);
adding a bounding box around the detected object in the cropped portion of the image (see Xu: e.g., -- receive point cloud data associated with the object; determine a first feature vector associated with the point cloud data, the first feature vector comprising a geometric feature vector; determine a second feature vector associated with the cropped image, the second feature vector comprising an appearance feature vector; pass the first feature vector and the second feature vector into a neural network; and receive, from the neural network, coordinates descriptive of a three-dimensional bounding box associated with the object.--, in [0091]; also see Fig. 2, [0012], and, -- the image 1 10 is captured by at least one stereo camera, RGBD camera, and/or depth camera. Use of multiple cameras may allow for recovery of depth information through the use of multiple view geometry. In such embodiments, depth information from stereo or RGBD cameras is used to aid detection of objects in image 1 10 for segmenting the image 1 10 and creating the two-dimensional bounding box 1 14.--,  in [0019]-[0020], [0024]-[0026]);
	and determining a location in a spatial region where the vehicle is being driven based on the determined one or more positions of the second set of one or more reference points on the bounding box (see Xu: e.g., Fig. 2, -- One example three- dimensional representation is a three-dimensional bounding box. A three-dimensional bounding box may be a minimum volume cuboid which encompasses an object. The three- dimensional bounding box provides information about spatial location, orientation, as well as size for the object it contains.--, in [0011]-[0012]; -- a first processing algorithm 212 configured to receive the point cloud 204. In some implementations of this disclosure, the first processing algorithm 212 may include an artificial neural network (e.g., a convolutional neural network) configured to receive the point cloud and analyze the points. For example, the first processing algorithm 212 may be a PointNet network. PointNet is a deep network architecture that receives raw point cloud data and learns both global and local point features. PointNet has been used conventionally for classification, part segmentation, and semantic segmentation purposes. For purposes of this disclosure, however, the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206.--, in [0020]-[0028] ];  also see: -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]);
	Xu does not however explicitly disclose above determined location is a location of the detected object, 
	Mao teaches determining a location of the detected object in a spatial region where the vehicle is being driven based on the determined one or more positions of the second set of one or more reference points on the bounding box (see Mao: e.g., -- determination of object depths for objects shown in the image based on differences in spatial orientations/offsets of the cameras' image sensors. With respect to LIDAR and RADAR, the raw sensor data can indicate a distance, a direction, and an intensity of reflected radiation.--, in [0038], and, -- the feature vector 322 based on a location of the object of interest in the environment, i.e., a location of the object represented by the object patches in first neural network inputs 316a-n. In some implementations, the system (e.g., interface 308) selects the feature vector 322 that corresponds to the region of the environment where the object of interest is located. If the object of interest spans multiple regions, the system may select a feature vector 322 that corresponds to the region of the environment where the greatest portion of the object is located…. an interface and pre-processor subsystem may obtain sensor data for a portion of an environment within sensing range of a vehicle, detect an object of interest near the vehicle, determine a bounding box (e.g., a rectangular box) around the object, and extract the content of the bounding box to form a patch for the object of interest. The bounding box may be drawn tightly around the object of interest. --, in [0058]-[0062]), 
Xu and Mao are combinable as they are in the same field of endeavor: image processing of image data captured by the camera of a vehicle and analysis of detected objects . Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Xu’s method using Mao’s teachings by including determining a location of the detected object in a spatial region to Xu’s determining location such as feature vectors in order to determine a location of the object of interest in the environment (see Mao: e.g. in [0058]-[0062]).

Re Claim 7, claim 7 is the corresponding medium claim to claim 1 respectively. Thus, claim 7 is rejected for the same reasons as for claim 1 respectively. Furthermore, Xu as modified by Mao further disclose non-transitory computer readable storage medium having code stored thereon, the code, when executed by a processor, causing the processor to implement a method of performing the steps (see Xu: e.g., -- the subject matter presented herein may be implemented as a computer process, a computer-controlled apparatus, a computing system, or an article of manufacture, such as a computer-readable storage medium. While the subject matter described with respect to the methods 400, 500 are presented in the general context of operations that may be executed on and/or with one or more computing devices, those skilled in the art will recognize that other implementations may be performed in combination with various program/controller modules. Generally, such modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.--, in [0037]).
 
Re Claim 8, Xu as modified by Mao further disclose wherein a position of a single reference point of the first set on the obtained image is a center point of the region of interest (see Xu: e.g., -- one such object in the environment 100 is a vehicle 102. The environment 100 is associated with a coordinate system 104. The coordinate system 104 may be either global or local. In a global coordinate system, any point expressed in the coordinate system 104 is an absolute coordinate. Alternatively, in a local coordinate system points are expressed relative to an arbitrarily defined origin (such as a center of an autonomous vehicle as it travels through the environment), which may move in a global coordinate system.--, in [0017], [0024], and [0035], and, -- both the point cloud and the bounding box objective may be cropped (or otherwise altered to leave only the data within the two-dimensional bounding box in the image and related points in the point cloud (e.g., by reprojection using a known transformation between the two sensors)) and rotated to be centered along an axis of the sensors, e.g., a Z-axis.--, in [0042]).

Re Claim 12, claim 12 is the corresponding apparatus claim to claim 1 respectively. Thus, claim 12 is rejected for the same reasons as for claim 1 respectively. Furthermore, Xu as modified by Mao further disclose an image processing apparatus for an autonomous vehicle comprising a processor, configured to implement a method to implement the method (see Xu: e.g., -- the subject matter presented herein may be implemented as a computer process, a computer-controlled apparatus, a computing system, or an article of manufacture, such as a computer-readable storage medium. While the subject matter described with respect to the methods 400, 500 are presented in the general context of operations that may be executed on and/or with one or more computing devices, those skilled in the art will recognize that other implementations may be performed in combination with various program/controller modules. Generally, such modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.--, in [0037]).

Re Claim 13, Xu as modified by Mao further disclose wherein the first set of one or more reference points are selected by the processor being configured to perform the method that comprises obtaining three-dimensional world coordinates of the first set of one or more reference points based on a terrain map and a location of the vehicle (see Xu: e.g., -- The second processing algorithm 216 may be configured to produce the feature vector(s) 218 associated with the vehicle image 210. In some examples, the process 200 may extract the feature vector(s) 218 from one of several layers of the residual learning network, before a prediction layer. For instance, the second processing algorithm 216 may be a ResNet-101 CNN and the feature vector 218 may be produced by the final residual block and averaged across feature map locations.--, in [0023]; also see Mao: e.g., -- An autonomous vehicle system can predict the types of nearby objects to improve understanding of its environment and make better driving and navigation decisions. By processing feature vectors representing context about a wider portion of the environment than just the portion of the environment where the object of interest is located, the accuracy of object classifications made by the system can be improved on average. Moreover, by generating a single context map in one pass through a context embedding neural network, the system can more efficiently use environmental context information to classify multiple objects located in the environment of a vehicle without needing to re-generate a context map and associated feature vectors for each object that is to be classified.--, in [0027], [0055]-[0058], [0068]-[0069], and [0072]).

Re Claim14, Xu as modified by Mao further disclose wherein the terrain map provides coordinates of points in the spatial region where the vehicle is being driven (see Xu: e.g., -- Measurement of the LIDAR system may be represented as three- dimensional LIDAR data having coordinates (e.g., Cartesian, polar, etc.) corresponding to positions or distances captured by the LIDAR system. For example, the LIDAR data may include point cloud data comprising a plurality of points in the environment. In some instances, LIDAR sensors can generate a large amount of range measurements within a short amount of time (e.g., 1000-100000 range measurements every 0.1 seconds).--, in [0010], and, -- an environment 100 may include various objects. For exemplary purposes, one such object in the environment 100 is a vehicle 102. The environment 100 is associated with a coordinate system 104. The coordinate system 104 may be either global or local. In a global coordinate system, any point expressed in the coordinate system 104 is an absolute coordinate. Alternatively, in a local coordinate system points are expressed relative to an arbitrarily defined origin (such as a center of an autonomous vehicle as it travels through the environment), which may move in a global coordinate system.--, in [0017]-[0020]).

Re Claim 17,  Xu as modified by Mao further disclose wherein the bounding box includes a plurality of vertices located at a plurality of corners of the bounding box (see Xu: e.g., Fig. 2, an, -- a machine learning algorithm is applied to the image data and the point cloud data, to estimate parameters for a three- dimensional bounding box associated with one or more objects in the environment. For instance, a first feature vector associated with the image data, for example, associated with a cropped image corresponding to the object of interest, and a second feature vector associated with the point cloud data may be input to the machine learning algorithm. The machine learning algorithm may output parameters of the three-dimensional bounding box. The parameters may include eight points in a coordinate system, the eight points representing the eight comers of the three-dimensional bounding box. An example machine learning algorithm used to recover the parameters is an artificial neural network (ANN), which may be a Convolutional Neural Network (CNN).--, in [0012]; and, -- the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206….. The second processing algorithm 216 may be configured to receive the vehicle image 210 and produce one or more appearance feature vectors 218 associated with the vehicle image 210. --, in [0022]-[0023];  also see: -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]).

Re Claim 18, Xu as modified by Mao further disclose wherein the second set of one or more reference points includes a reference point located midpoint in between two vertices of the bounding box (see Xu: e.g., Fig. 2, an, -- a machine learning algorithm is applied to the image data and the point cloud data, to estimate parameters for a three- dimensional bounding box associated with one or more objects in the environment. For instance, a first feature vector associated with the image data, for example, associated with a cropped image corresponding to the object of interest, and a second feature vector associated with the point cloud data may be input to the machine learning algorithm. The machine learning algorithm may output parameters of the three-dimensional bounding box. The parameters may include eight points in a coordinate system, the eight points representing the eight comers of the three-dimensional bounding box. An example machine learning algorithm used to recover the parameters is an artificial neural network (ANN), which may be a Convolutional Neural Network (CNN).--, in [0012]; and, -- the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206….. The second processing algorithm 216 may be configured to receive the vehicle image 210 and produce one or more appearance feature vectors 218 associated with the vehicle image 210. --, in [0022]-[0023];  also see: -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]).

Re Claim 19, Xu as modified by Mao further disclose wherein the two vertices are closest to a surface of a road compared to other vertices of the bounding box  (see Xu: e.g., Fig. 2, an, -- a machine learning algorithm is applied to the image data and the point cloud data, to estimate parameters for a three- dimensional bounding box associated with one or more objects in the environment. For instance, a first feature vector associated with the image data, for example, associated with a cropped image corresponding to the object of interest, and a second feature vector associated with the point cloud data may be input to the machine learning algorithm. The machine learning algorithm may output parameters of the three-dimensional bounding box. The parameters may include eight points in a coordinate system, the eight points representing the eight comers of the three-dimensional bounding box. An example machine learning algorithm used to recover the parameters is an artificial neural network (ANN), which may be a Convolutional Neural Network (CNN).--, in [0012]; and, -- the first processing algorithm 212 may be configured to produce a feature vector associated with the point cloud. For instance, when PointNet is used as the first processing algorithm 212, feature vectors may be produced at one of several layers before a prediction layer. The process 200 may extract one or more of these feature vectors, as illustrated at 214. The feature vectors 214 may be purely geometric feature vectors, associated only with the location of the points in the point cloud 206….. The second processing algorithm 216 may be configured to receive the vehicle image 210 and produce one or more appearance feature vectors 218 associated with the vehicle image 210. --, in [0022]-[0023];  also see: -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]).

Re Claim 20, Xu as modified by Mao further disclose wherein the detected object in the cropped portion of the image is located at a distance between 500 meters and 1000 meters (see Mao: e.g., -- an interface and pre-processor subsystem may obtain sensor data for a portion of an environment within sensing range of a vehicle, detect an object of interest near the vehicle, determine a bounding box (e.g., a rectangular box) around the object, and extract the content of the bounding box to form a patch for the object of interest.--, in [0061]).


Claims 2-6, 9-11, 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Xu as modified by Mao, and further in view of Chang (US 20200311979 A1).
Re Claim 2, Xu as modified by Mao however do not explicitly disclose camera coordinates of a camera,
Chang discloses the location of the detected object is determined by performing: for each determined position of each reference point of the second set: determining camera coordinates of a camera center point located on a ray that passes through a position of a reference point, wherein the camera center point is located on a camera coordinate plane located at a focal length distance away from an image plane where the image is received (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]),
 determining, based at least on the camera coordinates, first world coordinates of the position of the reference point  (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]);
Xu as modified by Mao and Chang are combinable as they are in the same field of endeavor: image processing of image data captured by the camera and analysis of detected objects . Therefore it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to further modify Xu (as modified by Mao)’s method using Chang’s teachings by including for each determined position of each reference point of the second set: determining camera coordinates of a camera center point located on a ray that passes through a position of a reference point, wherein the camera center point is located on a camera coordinate plane located at a focal length distance away from an image plane where the image is received; determining, based at least on the camera coordinates, first world coordinates of the position of the reference point to Xu (as modified by Mao)’s determining location of detected object in order to determine a location of the object of interest in the environment as world coordinates of the object (see Chang: e.g. in [0008], and [0021]-[0023]);
Xu as modified by Mao and Chang further disclose determining, based on a terrain map, second world coordinates of a point of intersection of the reference point and a road surface, wherein the terrain map provides coordinates of points in the spatial region where the vehicle is being driven (see Xu: e.g., -- The second processing algorithm 216 may be configured to produce the feature vector(s) 218 associated with the vehicle image 210. In some examples, the process 200 may extract the feature vector(s) 218 from one of several layers of the residual learning network, before a prediction layer. For instance, the second processing algorithm 216 may be a ResNet-101 CNN and the feature vector 218 may be produced by the final residual block and averaged across feature map locations.--, in [0023]; also see Mao: e.g., -- An autonomous vehicle system can predict the types of nearby objects to improve understanding of its environment and make better driving and navigation decisions. By processing feature vectors representing context about a wider portion of the environment than just the portion of the environment where the object of interest is located, the accuracy of object classifications made by the system can be improved on average. Moreover, by generating a single context map in one pass through a context embedding neural network, the system can more efficiently use environmental context information to classify multiple objects located in the environment of a vehicle without needing to re-generate a context map and associated feature vectors for each object that is to be classified.--, in [0027], [0055]-[0058], [0068]-[0069], and [0072]); and 
assigning the second world coordinates for the second set of one or more reference points to the location of the detected object in the spatial region  (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]).

Re Claim 3, Xu as modified by Mao and Chang further disclose  wherein the second world coordinates of the point of intersection is determined by: obtaining a first set of points along the ray, wherein the reference point of the second set belongs to the first set of points (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]); 
determining a first set of world coordinates corresponding to the first set of points, wherein the first world coordinates belongs to the first set of world coordinates (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]);
determining, based on the terrain map and corresponding to the first set of points, a second set of points on the road surface (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023], also see Mao: -- A first type of projection is a top-down projection as shown in patch 410. A top-down projection is a projection of the point cloud data onto a region surrounding the vehicle from a location above the vehicle itself. The projection plane for a top-down projection is thus substantially parallel to the surface on which the vehicle is standing.--, in [0062]); 
determining a second set of world coordinates corresponding to the second set of points; determining plurality of heights between each point associated with the first set of world coordinates and a corresponding point associated with the second set of world coordinates; determining a minimum height from the plurality of heights (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]);
identifying a point from the second set of points associated with the minimum height; and obtaining world coordinates of the point, wherein the second world coordinates are determined to be same as the world coordinates of the point (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]).

Re Claim 4, Xu as modified by Mao and Chang further disclose  wherein the second world coordinates of the point of intersection is determined by: determining a first mathematical function that describes the ray (see Chang: e.g., in [0044]-[0047]);
determining, based on the terrain map, world coordinates of two or more points on the bounding box (see Xu: e.g., -- three-dimensional bounding box 226, defined by eight comers 228, is illustrated in FIG. 2. In another example, the ANN 222 may predict a center location, orientation, and three dimensional extents of such a bounding box. In such a manner, the ANN 222 may constrain the output to retain a rectangular volume shape…. provides a global architecture that directly regresses coordinates descriptive of a bounding box. FIG. 3 is a pictorial representation of a process 300, which, like the process 200, also determines parameters of a three-dimensional bounding box using a point cloud 302 and a cropped image 304 associated with an object.--, in [0024]-[0026]);
determining a second mathematical function that describes a plane that includes the two or more points (see Chang: e.g., --In step S210, the processor 130 obtains world coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points. Further, in step S220, the processor 130 calculates a plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to the camera 110 according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera 110.--, in [0032], and [0044]-[0047]);
determining an intersection of the first mathematical function and the second mathematical function; and obtaining world coordinates of the intersection, wherein the second world coordinates are determined to be same as the world coordinates of the intersection (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]).

Re Claim 5, Xu as modified by Mao and Chang further disclose wherein a camera intrinsic matrix is used to determine the camera coordinates of the camera center point in the camera coordinate plane (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023], and, -- transformation between a world coordinate system and a camera coordinate system according to an embodiment of the disclosure. With reference to FIG. 3, to be specific, in FIG. 3, (1) corresponds to the world coordinate system. At this time, the x axis, the y axis, and the z axis of the camera coordinate system respectively overlap the X axis, the Y axis, and the Z axis of the world coordinate system, therein, world coordinates of a lens center of the camera 110 are (0,0,0).--, in [0039]-[0040]).

Re Claim 6, Xu as modified by Mao and Chang further disclose wherein a camera extrinsic matrix is used with the camera coordinates to determine the first world coordinates of the reference point (see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023], and, -- transformation between a world coordinate system and a camera coordinate system according to an embodiment of the disclosure. With reference to FIG. 3, to be specific, in FIG. 3, (1) corresponds to the world coordinate system. At this time, the x axis, the y axis, and the z axis of the camera coordinate system respectively overlap the X axis, the Y axis, and the Z axis of the world coordinate system, therein, world coordinates of a lens center of the camera 110 are (0,0,0).--, in [0039]-[0040]; also see: in [0032], and [0044]-[0047]).

Re Claim 9, Xu as modified by Mao and Chang further disclose wherein the cropped portion has a first resolution that is less than a second resolution of the obtained image (see Xu: e.g., Fig. 2, and, -- a machine learning algorithm is applied to the image data and the point cloud data, to estimate parameters for a three- dimensional bounding box associated with one or more objects in the environment. For instance, a first feature vector associated with the image data, for example, associated with a cropped image corresponding to the object of interest, and a second feature vector associated with the point cloud data may be input to the machine learning algorithm. The machine learning algorithm may output parameters of the three-dimensional bounding box. The parameters may include eight points in a coordinate system, the eight points representing the eight comers of the three-dimensional bounding box. An example machine learning algorithm used to recover the parameters is an artificial neural network (ANN), which may be a Convolutional Neural Network (CNN).--, in [0012] {cropped image is portion of obtained image, thus las less resolution.}; also see Chang: e.g., --in step S510 of FIG. 5, the processor 130 obtains an output image (a first image) from the camera 110 and obtains resolution information corresponding to the output image to obtain position of an image central point p.sub.C according to the resolution information. (43) Specifically, the resolution information corresponds to the width and height resolution of the output image. Therefore, after obtaining the width and height resolution of the image, the processor 130 transfers the location of image into coordinates according to the resolution information and analyzes the image. In this way, the processor 130 may further determine the image coordinates of the two projection points. For instance, in the image with the resolution being 1920×1080, the upper left corner being the origin, the horizontal axis extending from left to right, and the vertical axis extending from top to bottom, the position of the image central point is (960,540). If position of a projection point p.sub.A are (u′.sub.A,v′.sub.A) and position of a projection point p.sub.B are (u′.sub.B, v′.sub.B), in this embodiment, the processor 130 further sets the image coordinates of the image central point to be (0,0), the horizontal axis to be extending from left to right, and the vertical axis to be extending from bottom to top. That is, the coordinates of the upper left corner of the image are changed from (0, 0) to (−960, 540), the image coordinates of the projection point p.sub.A are (u.sub.A,v.sub.A)=(u′.sub.A−960,−v′.sub.A+540), and the image coordinates of the projection point p.sub.B are (u.sub.B,v.sub.B)=(u′.sub.B−960,−v′.sub.B+540).--, in [0051]-[0052]). See the similar obviousness and motivation statements addressed above for claim 2 as discussed above. 

Re Claim 10, Xu as modified by Mao and Chang further disclose wherein two positions of two reference points of the first set on the obtained image respectively correspond to a first distance and a second distance from the location of the vehicle (see Xu: -- Measurement of the LIDAR system may be represented as three- dimensional LIDAR data having coordinates (e.g., Cartesian, polar, etc.) corresponding to positions or distances captured by the LIDAR system. For example, the LIDAR data may include point cloud data comprising a plurality of points in the environment.--, in [0010]; see Mao: e.g., -- the sensor subsystems use various technologies to measure and detect information about the environment. For example, one or more LIDAR subsystems may emit electromagnetic radiation and determine the locations of objects in the environment based on attributes of reflections of the emitted radiation that vary with the distance of the object from the vehicle. One or more camera subsystems may capture images of the environment. The sensor subsystems can provide their measurements as sensor data to a sensor subsystem interface and pre-processor, e.g., interface 306. [0067] The sensor data acquired by the sensor subsystems may include indications of multiple objects within a pre-defined distance (e.g., a sensing range) of the vehicle. At stage 604, the system (e.g., interface 306) selects one as an object of interest to be classified. The object of interest may be selected using any suitable criteria, such as a prominence of the object in the sensor data, a proximity of the object to the vehicle--, in [0066]-[0067], also see Cheng: -- A reference distance from the lens central point P.sub.O of the camera to the first reference point P.sub.A is d.sub.1, a reference distance from the lens central point P.sub.O of the camera to the second reference point P.sub.B is d.sub.2, and a reference distance from the first reference point P.sub.A to the second reference point P.sub.B is d.sub.3. The lens central point P.sub.O of the camera vertically projects to the ground to form a vertical intersection point P.sub.Q, and a height from the lens central point P.sub.O of the camera to the vertical intersection point P.sub.Q is h.--, in [0053]). See the similar obviousness and motivation statements addressed above for claim 2 as discussed above.

Re Claim 11, claim 11 is the corresponding medium claims to claim 2 respectively. Thus, claim 7 is rejected for the same reasons as for claim 2 respectively. Furthermore, Xu as modified by Mao and Chang further disclose non-transitory computer readable storage medium having code stored thereon, the code, when executed by a processor, causing the processor to implement a method of performing the steps (see Xu: e.g., -- the subject matter presented herein may be implemented as a computer process, a computer-controlled apparatus, a computing system, or an article of manufacture, such as a computer-readable storage medium. While the subject matter described with respect to the methods 400, 500 are presented in the general context of operations that may be executed on and/or with one or more computing devices, those skilled in the art will recognize that other implementations may be performed in combination with various program/controller modules. Generally, such modules include routines, programs, components, data structures, and other types of structures that perform particular tasks or implement particular abstract data types.--, in [0037]).

Re Claim 15, Xu as modified by Mao and Chang further disclose wherein the one or more positions of the first set of one or more reference points on the obtained image are determined by projecting the three-dimensional world coordinates of the first set of one or more reference points to the image by using a camera pose information associated with the obtained image (see Xu: e.g., -- the per-point feature vector from PointNet may be associated with local appearance information extracted from an intermediate layer of the image processing algorithm 308. Specifically, there may be no clear correspondence between points in the point cloud 302 and attributes (e.g., pixels) in the image 304. In implementations of this disclosure, each point in the point cloud 302 may be projected onto the image plane using a known camera model.--, in [0033], also see Chang: e.g., Fig. 6, -- World coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points are obtained. A plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to a camera are calculated according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera. A second image having an object image corresponding to an object is obtained through a camera. World coordinates of the object are positioned according to the coordinate transformation parameters.--, in [0008], and, -- The camera coordinate system is a three-dimensional coordinate system formed by treating the center point of the camera lens as the origin. In the camera coordinate system, the directions of the three axes in the three-dimensional coordinate system is defined corresponding to the left-handed coordinate system or the right-handed coordinate system.--, in [0021]-[0023]; --In step S210, the processor 130 obtains world coordinates of two reference points and image coordinates of two projection points corresponding to the two reference points. Further, in step S220, the processor 130 calculates a plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to the camera 110 according only to the world coordinates of the two reference points, the image coordinates of the two projection points, and world coordinates of the camera 110.--, in [0032], and [0044]-[0047]). See the similar obviousness and motivation statements addressed above for claim 2 as discussed above.

Re Claim 16, Xu as modified by Mao and Chang further disclose wherein the camera pose information characterizes optical properties, orientation, or location of the camera (see Mao: e.g., -- a light detection and ranging (LIDAR) subsystem that detects and processes reflections of laser light, a radio detection and ranging (RADAR) subsystem that detects and processes reflections of radio waves, or both. The sensor subsystems 132 can also include one or more camera subsystems that detect and process visible light. The camera subsystems can be monoscopic, stereoscopic, or other multi-view cameras that permit determination of object depths for objects shown in the image based on differences in spatial orientations/offsets of the cameras' image sensors. With respect to LIDAR and RADAR, the raw sensor data can indicate a distance, a direction, and an intensity of reflected radiation.--, in [0038]).


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WEI WEN YANG whose telephone number is (571)270-5670.  The examiner can normally be reached on 8:00 - 5:00 pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on 571-272-7778.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/WEI WEN YANG/Primary Examiner, Art Unit 2667