Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Response to Amendment
Claims 15, 16 and 20 are currently amended. Claims 1-20 are pending.  
Response to Arguments
Applicant’s arguments, see Remarks, filed December 16, 2021, with respect to the rejection(s) of claim(s) claims 1 and 15 under 35 USC 103 have been fully considered and are persuasive.  Therefore, the rejection has been withdrawn.  However, upon further consideration, a new ground(s) of rejection is made in view of US 2019/0138822 A1 to Yao et al. US 2017/00185823 A1 to Gold et al., as described herein below. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 5-8, 12-16, and 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2019/0138822 A1 to Yao et al., hereinafter “Yao” in view of US 2017/00185823 A1 to Gold et al., hereinafter, “Gold” and US 2011/0282578 A1 to Miksa et al., hereinafter, “Miksa”.
Claim 1. An image processing method, comprising: receiving bounding box information that describes a bounding box located around a detected object in an image, wherein the bounding box information is received while a vehicle is being driven Yao [0007] teaches the present disclosure generally relates to vehicles, and more particularly, to methods and systems for detecting environmental information of a vehicle.

Yao [0007] teaches the environmental information relating to the one or more objects may include at least one of the object type of the one or more objects, a motion state of at least one of the one or more objects, a velocity of at least one of the one or more objects relative to a vehicle including the one or more LiDARs and the camera, an acceleration of at least one of the one or more objects relative to the vehicle, a moving direction of at least one of the one or more objects, or a distance between at least one of the one or more objects and the vehicle.

Yao [0063] teaches the type determination module 360 may be configured to determine an object type of the one or more objects in the image based on the 2D coordinates of the plurality of points, the 3D coordinates of the plurality of points, the segment result, and the image. The type determination module 360 may identify the one or more objects in the image by determining a bounding box for each of the one or more objects based on the segment result and the 2D coordinates of the plurality of points. For one of the one or more objects, the type determination module 360 may determine the object type based on the 3D coordinates of the center point of the object, the length of the object, the width of the object, the height of the object, the number of points of the object, or the like, or any combination thereof. The center point of the object may refer to the center pixel (or voxel) of the bounding box of the object.

determining, from the bounding box information and in the image, one or more positions of one or more reference points on the bounding box; Yao [0063] teaches the type determination module 360 may be configured to determine an object type of the one or more objects in the image based on the 2D coordinates of the plurality of points, the 3D coordinates of the plurality of points, the segment result, and the image. The type determination module 360 may identify the one or more objects in the image by determining a bounding box for each of the one or more objects based on the segment result and the 2D coordinates of the plurality of points. For one of the one or more objects, the type determination module 360 may determine the object type based on the 3D coordinates of the center point of the object, the length of the object, the width of the object, the height of the object, the number of points of the object, or the like, or any combination thereof. The center point of the object may refer to the center pixel (or voxel) of the bounding box of the object.

for each determined position of each reference point: determining camera coordinates of a camera center point located on a ray that passes through a position of a reference point, Gold [0087] teaches a first possible embodiment (E3) of the Position component (2000), as depicted in FIG. 7A, is configured to compute the said position and angel-of-gaze (orientation) by solving a simultaneous system of equations derived from the inputs. The said embodiment comprises an equation generation module (2100) configured to generate a simultaneous system of equations. The unknowns are the 3D real-world coordinates of the said camera center, and the rotation matrix defining its orientation.

Gold [0089] teaches a second possible embodiment (E4) of the Positioning Component (2000), as depicted in FIG. 7B, is configured to compute the said 3D coordinates of the camera center separately from computing said camera orientation. The said embodiment comprises a 3D position calculation module (2300) configured to compute the said 3D coordinates, and further comprises an Orientation module (2400) configured to compute the rotation matrix defining said camera orientation, or spatial angle of gaze.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

Yao [0063] teaches the type determination module 360 may be configured to determine an object type of the one or more objects in the image based on the 2D coordinates of the plurality of points, the 3D coordinates of the plurality of points, the segment result, and the image. The type determination module 360 may identify the one or more objects in the image by determining a bounding box for each of the one or more objects based on the segment result and the 2D coordinates of the plurality of points. For one of the one or more objects, the type determination module 360 may determine the object type based on the 3D coordinates of the center point of the object, the length of the object, the width of the object, the height of the object, the number of points of the object, or the like, or any combination thereof. The center point of the object may refer to the center pixel (or voxel) of the bounding box of the object. The type determination module 360 may determine the 3D coordinates of the center point based on the 3D coordinates of the point of which the 2D coordinates are similar to the 2D coordinates of the center pixel of the bounding box. The type determination module 360 may determine the length, the width, and the height of the object based on the 3D coordinates of points of which the 2D coordinates are similar to the 2D coordinates of pixels included in the bounding box. The number of points of the object may be the number of points of which the 2D coordinates are similar to the 2D coordinates of pixels included in the bounding box. [0073]

wherein the camera center point is located on a camera coordinate plane located at a focal length distance away from an image plane where the image is received; Gold [0089] teaches a second possible embodiment (E4) of the Positioning Component (2000), as depicted in FIG. 7B, is configured to compute the said 3D coordinates of the camera center separately from computing said camera orientation. The said embodiment comprises a 3D position calculation module (2300) configured to compute the said 3D coordinates, and further comprises an Orientation module (2400) configured to compute the rotation matrix defining said camera orientation, or spatial angle of gaze.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

and assigning the second world coordinates for the one or more reference points to a location of the detected object in the spatial region. Yao [Fig. 5-7]

determining, based at least on the camera coordinates, first world coordinates of the position of the reference point; and determining, based on a terrain map, second world coordinates of a point of intersection of the reference point and a road surface, wherein the terrain map provides coordinates of points in a spatial region where the vehicle is being driven; Yao [Fig. 5-7]

Yao [0005] teaches a system may comprise one or more processors, and a storage device configured to communicate with the one or more processors. The storage device may include a set of instructions. When the one or more processors executing the set of instructions, the one or more processors may be directed to perform one or more of the following operations. The one or more processors may receive, from a camera, a first image including a plurality of pixels relating to one or more objects. The one or more processors may receive, from one or more light detection and ranging equipments (LiDARs), a first point set including a plurality of points corresponding to the plurality of pixels of the first image. The one or more processors may determine first 3D coordinates of the plurality of points and reflection intensities of the plurality of points based on the first point set. The one or more processors may generate a segment result by classifying the plurality of points based on the first 3D coordinates of the plurality of points and the reflection intensities of the plurality of points. The one or more processors may transform the first 3D coordinates of the plurality of points into first 2D coordinates of the plurality of points. The one or more processors may determine an object type of the one or more objects based on the first 2D coordinates of the plurality of points, the first 3D coordinates of the plurality of points, the segment result, and the first image.

Yao [0009] teaches to determine the first 3D coordinates of the plurality of points, the one or more processors may determine second 3D coordinates of the first point subset corresponding to a first 3D coordinate system relating to the first LiDAR. The one or more processors may determine third 3D coordinates of the second point subset corresponding to a second 3D coordinate system relating to the second LiDAR. The one or more processors may transform the third 3D coordinates into fourth 3D coordinates of the second point subset corresponding to the first 3D coordinate system. The one or more processors may determine the first 3D coordinates based on the second 3D coordinates of the first point subset and the fourth 3D coordinates of the second point subset.

Yao [0010] teaches to transform the first 3D coordinates of the plurality of points into the first 2D coordinates of the plurality of points, the one or more processors may obtain a transformation matrix. The one or more processors may transform the first 3D coordinates of the plurality of points into the first 2D coordinates of the plurality of points based on the transformation matrix.

Yao [0059] teaches the point information determination module 330 may be configured to determine 3D coordinates of the plurality of points and reflection intensities of the plurality of points based on the point set. The points obtained by a LiDAR may include reflection intensities of the points, and 3D coordinates of the points determined under a 3D coordinate system corresponding to the LiDAR. In some embodiments, if the point set includes points obtained by one LiDAR, the point information determination module 330 may directly determine the 3D coordinates based on the plurality of points. In some embodiments, different LiDARs may correspond to different 3D coordinate systems. If the point set includes points obtained by more than one LiDARs, the 3D coordinates of the points obtained by different LiDARs may correspond to different 3D coordinate systems. The point information determination module 330 may determine a 3D coordinate system corresponding to one of the LiDARs as a standard 3D coordinate system, and transform the 3D coordinates corresponding to other 3D coordinate systems into 3D coordinates under the standard coordinate system (e.g., as described in detail in connection with FIG. 5 below)., [0090]

Yao and Gold fails to explicitly teach a terrain map, however, Miksa, in the field of acquiring geological information in image data, teaches [0042] as the positional information of the 3D-model in a coordinate reference system is accurately known, the corresponding part of the image could be rectified accurately. The present invention enables us to generate a huge amount of 3D-models that could be used as GCO's in an easy way and short time period. An advantage of 3D-models over a database with GCP's is that a 3D-model models a part of earth surface, whereas a GCP refers to only one XYZ-coordinate. When using a database with GCP's, the elevation information of locations between GCP's has to be estimated, which could result in mapping inaccuracies. The method helps us to capture 3D-models of the earth surface.

Miksa [0097] teaches the image data is used to determine the location of road surfaces first in the image and by combining location in the images with the laser data, the position of the road surface in a coordinate reference system. However, the image data can further be used to enhance the 3D-model with the "real world" appearance of the road surface, showing road markings, texture and color of the road surface, pavement type, shoulder, etc. Furthermore, these markings can form a dense array of GCP's to enable complete positioning and/or rectification of a road segment. In action 414, an orthorectified image is generated for a linear feature.

Miksa, [0003] teaches Ground control points (GCP's) are used in orthorectifying satellite, aerial or aero survey imagery to standard map projections. A ground control point can be any point on the surface of the earth which is recognizable on remotely sensed images, maps or aerial photographs and which can be accurately located on each of these. A ground control point has defined associated coordinates in a coordinate reference systems A ground control point is a point on the surface of the earth of known location (i.e. fixed within an established co-ordinate reference system). GCP's are used to geo-reference image data sources, such as remotely sensed images or scanned maps, and divorced survey grids, such as those generated during geophysical survey. A GCP could be: [0004] a copy of a part of a paper map showing a selected point and its surrounding;

Miksa [0010] teaches a GCP can be any photo-recognizable feature to identify one point having associated precise X, Y and Z coordinates in a coordinate reference system.

Miksa [0042] teaches as the positional information of the 3D-model in a coordinate reference system is accurately known, the corresponding part of the image could be rectified accurately. The present invention enables us to generate a huge amount of 3D-models that could be used as GCO's in an easy way and short time period. An advantage of 3D-models over a database with GCP's is that a 3D-model models a part of earth surface, whereas a GCP refers to only one XYZ-coordinate. When using a database with GCP's, the elevation information of locations between GCP's has to be estimated, which could result in mapping inaccuracies. The method helps us to capture 3D-models of the earth surface.

Miksa [0097] teaches the image data is used to determine the location of road surfaces first in the image and by combining location in the images with the laser data, the position of the road surface in a coordinate reference system. However, the image data can further be used to enhance the 3D-model with the "real world" appearance of the road surface, showing road markings, texture and color of the road surface, pavement type, shoulder, etc. Furthermore, these markings can form a dense array of GCP's to enable complete positioning and/or rectification of a road segment. In action 414, an orthorectified image is generated for a linear feature.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious to one of ordinary skill in the art to combine receiving bounding box information that describes a bounding box located around a detected object in an image; from the bounding box information and in the image, one or more positions of one or more reference points on the bounding box by Yao and Gold with Miksa’s teaching of based on a terrain map, second world coordinates of a point of intersection of the reference point and a road surface. One would have been motivated to perform this combination due to the fact that it allows one to accurately transform two-dimensional data of an object info world coordinates along with camera calibration (Yao [0003-0004]). In combination, Yao is not altered in that Yao continues to receive the bounding box information while a vehicle is being driven, and Gold continues to indicate world coordinates of an object. Miksa's teachings perform the same as they do separately of obtaining positional information of objects on earth surface (terrain map).


Claim 2. Yao and Miksa further teaches wherein the second world coordinates of the point of intersection is determined by: obtaining a first set of points along the ray, wherein the reference point belongs to the first set of points; determining a first set of world coordinates corresponding to the first set of points, wherein the first world coordinates belongs to the first set of world coordinates; determining, based on the terrain map and corresponding to the first set of points, a second set of points on the road surface; Miksa [0003] teaches Ground control points (GCP's) are used in orthorectifying satellite, aerial or aero survey imagery to standard map projections. A ground control point can be any point on the surface of the earth which is recognizable on remotely sensed images, maps or aerial photographs and which can be accurately located on each of these. A ground control point has defined associated coordinates in a coordinate reference systems A ground control point is a point on the surface of the earth of known location (i.e. fixed within an established co-ordinate reference system). GCP's are used to geo-reference image data sources, such as remotely sensed images or scanned maps, and divorced survey grids, such as those generated during geophysical survey. A GCP could be: [0004] a copy of a part of a paper map showing a selected point and its surrounding;

Miksa [0010] teaches a GCP can be any photo-recognizable feature to identify one point having associated precise X, Y and Z coordinates in a coordinate reference system.

Miksa [0042] teaches as the positional information of the 3D-model in a coordinate reference system is accurately known, the corresponding part of the image could be rectified accurately. The present invention enables us to generate a huge amount of 3D-models that could be used as GCO's in an easy way and short time period. An advantage of 3D-models over a database with GCP's is that a 3D-model models a part of earth surface, whereas a GCP refers to only one XYZ-coordinate. When using a database with GCP's, the elevation information of locations between GCP's has to be estimated, which could result in mapping inaccuracies. The method helps us to capture 3D-models of the earth surface.
Miksa [0097] teaches the image data is used to determine the location of road surfaces first in the image and by combining location in the images with the laser data, the position of the road surface in a coordinate reference system. However, the image data can further be used to enhance the 3D-model with the "real world" appearance of the road surface, showing road markings, texture and color of the road surface, pavement type, shoulder, etc. Furthermore, these markings can form a dense array of GCP's to enable complete positioning and/or rectification of a road segment. In action 414, an orthorectified image is generated for a linear feature.

determining a second set of world coordinates corresponding to the second set of points; determining plurality of heights between each point associated with the first set of world coordinates and a corresponding point associated with the second set of world coordinates; determining a minimum height from the plurality of heights; identifying a point from the second set of points associated with the minimum height; and obtaining world coordinates of the point, wherein the second world coordinates are determined to be same as the world coordinates of the point. Yao [Fig. 5-7]

Claim 3. Gold further teaches wherein the second world coordinates of the point of intersection is determined by: determining a first mathematical function that describes the ray; Gold [0089] teaches a second possible embodiment (E4) of the Positioning Component (2000), as depicted in FIG. 7B, is configured to compute the said 3D coordinates of the camera center separately from computing said camera orientation. The said embodiment comprises a 3D position calculation module (2300) configured to compute the said 3D coordinates, and further comprises an Orientation module (2400) configured to compute the rotation matrix defining said camera orientation, or spatial angle of gaze.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

determining, based on the terrain map, world coordinates of two or more points on the bounding box; Gold [0089] teaches a second possible embodiment (E4) of the Positioning Component (2000), as depicted in FIG. 7B, is configured to compute the said 3D coordinates of the camera center separately from computing said camera orientation. The said embodiment comprises a 3D position calculation module (2300) configured to compute the said 3D coordinates, and further comprises an Orientation module (2400) configured to compute the rotation matrix defining said camera orientation, or spatial angle of gaze.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

determining a second mathematical function that describes a plane that includes the two or more points; determining an intersection of the first mathematical function and the second mathematical function; and obtaining world coordinates of the intersection, wherein the second world coordinates are determined to be same as the world coordinates of the intersection. Gold [0089] teaches a second possible embodiment (E4) of the Positioning Component (2000), as depicted in FIG. 7B, is configured to compute the said 3D coordinates of the camera center separately from computing said camera orientation. The said embodiment comprises a 3D position calculation module (2300) configured to compute the said 3D coordinates, and further comprises an Orientation module (2400) configured to compute the rotation matrix defining said camera orientation, or spatial angle of gaze.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

Claim 5. Yao further teaches wherein the detected object includes a car, a truck, a truck- trailer, a semi-truck, an emergency vehicle, a pedestrian, a motorcycle, or an obstacle on a road. Yao [0032] teaches the one or more LiDARs may be configured to obtain point sets including a plurality of points relating to objects (e.g., a person, an animal, a tree, a roadblock, building, or a vehicle) that are within the scope of the one or more LiDARs (e.g., a distance, say 500 meters, from the vehicle). A point in a point set may include 3D coordinates of the point and a reflection intensity of the point. .

Claim 6. Gao further teaches, wherein the detected object in the image is located at a distance between 500 meters and 1000 meters. Yao [0007] teaches the environmental information relating to the one or more objects may include at least one of the object type of the one or more objects, a motion state of at least one of the one or more objects, a velocity of at least one of the one or more objects relative to a vehicle including the one or more LiDARs and the camera, an acceleration of at least one of the one or more objects relative to the vehicle, a moving direction of at least one of the one or more objects, or a distance between at least one of the one or more objects and the vehicle.

Yao [0030] FIG. 1 is a schematic diagram illustrating an exemplary environmental information detecting system 100 according to some embodiments of the present disclosure. The environmental information detecting system 100 may be configured to detect the information of the environmental around a vehicle (e.g., a driverless vehicle). The information of the environmental (e.g., also referred to as the environmental information) around the vehicle may include an object type (e.g., a person, an animal, a tree, a roadblock, a building, or a vehicle) of objects that are within a distance (e.g., 500 meters) from the vehicle, motion states of the objects, velocities of the objects relative to the vehicle, accelerations of the objects relative to the vehicle, moving directions of the objects, distances between the vehicle and the objects, or the like, or any combination thereof. The motion state may include a static state or a moving state. The processing device 130 may direct the vehicle to avoid an obstacle based on the environmental information. For example, when the processing device 130 determines that the distance between the vehicle and an object is less than a distance threshold based on the environmental information, the processing device 130 may send instructions and control the vehicle, such as braking, slowing down the velocity of the vehicle, changing the moving direction of the vehicle, or moving backward to direct the vehicle to avoid the object.

Yao [0032] teaches the one or more LiDARs may be configured to obtain point sets including a plurality of points relating to objects (e.g., a person, an animal, a tree, a roadblock, building, or a vehicle) that are within the scope of the one or more LiDARs (e.g., a distance, say 500 meters, from the vehicle). A point in a point set may include 3D coordinates of the point and a reflection intensity of the point.  Examiner understands specifying a specific distance is merely a design option. 

Claim 7. It differs from claim 1 in that it is a non-transitory computer readable storage medium having code stored thereon, the code, when executed by a processor, causing the processor to implement a method of claim 1. Therefore claim 7 has been analyzed and reviewed in the same way as claim 1. See the above analysis. 

Claim 8. Gold further teaches wherein a camera intrinsic matrix is used to determine the camera coordinates of the camera center point in the camera coordinate plane. Gold [0015] teaches a set of simultaneous equations is constructed from said 2D pixel locations, said corresponding 3D feature-point coordinates, and the camera projection matrix derived from said camera's intrinsic parameters. The said set of simultaneous equations is solved to obtain the mobile device's position.

Gold [0099] teaches the said second embodiment (E6) further comprises a method (M11) for computing planar angle of gaze for solution candidate. The said method is configured to accept as input a candidate 2D camera position (x, z), the 2D projection coordinates (x, z) of the said plurality of retrieved feature points (1190b), the corresponding 1D pixel coordinate (x), and the camera intrinsic parameters. The said method is further configured to compute for each of the said plurality of retrieved feature points the gaze angle (AG) as the difference between two angles—a first angle (A2) between the z-axis and the line connecting the said candidate center position and the said retrieved point, and a second angle (A1) between the line of sight to the said retrieved point and the camera axis. The said method is further configured to compute and return a statistical representative (e.g., mean or median) of the multitude of gaze angle results calculated for said camera center position candidate, and the multitude of retrieved feature points.

Claim 12. Yao further teaches wherein the image is from a first region towards which the vehicle is being driven.Yao [0030] teaches FIG. 1 is a schematic diagram illustrating an exemplary environmental information detecting system 100 according to some embodiments of the present disclosure. The environmental information detecting system 100 may be configured to detect the information of the environmental around a vehicle (e.g., a driverless vehicle).

Yao [0056] teaches the image receiving module 310 may be configured to receive an image from the camera 120. The image may include a plurality of pixels relating to one or more objects that are within the scope of the camera 120 (e.g., a distance, say 500 meters, from the vehicle). In some embodiments, because objects surrounding the vehicle may constantly be changing, the camera 120 may constantly capture images relating to the objects surrounding the vehicle and transmit the images to the image receiving module 310 in real time.

Claim 13. Yao further teaches wherein the image is from a second region to a side of the vehicle. Yao [0030] teaches FIG. 1 is a schematic diagram illustrating an exemplary environmental information detecting system 100 according to some embodiments of the present disclosure. The environmental information detecting system 100 may be configured to detect the information of the environmental around a vehicle (e.g., a driverless vehicle).
 
Yao [0056] teaches the image receiving module 310 may be configured to receive an image from the camera 120. The image may include a plurality of pixels relating to one or more objects that are within the scope of the camera 120 (e.g., a distance, say 500 meters, from the vehicle). In some embodiments, because objects surrounding the vehicle may constantly be changing, the camera 120 may constantly capture images relating to the objects surrounding the vehicle and transmit the images to the image receiving module 310 in real time.

Claim 14. Yao further teaches wherein the image is from a third region away from which the vehicle is being driven. Yao [0030] teaches FIG. 1 is a schematic diagram illustrating an exemplary environmental information detecting system 100 according to some embodiments of the present disclosure. The environmental information detecting system 100 may be configured to detect the information of the environmental around a vehicle (e.g., a driverless vehicle).

Yao [0056] teaches the image receiving module 310 may be configured to receive an image from the camera 120. The image may include a plurality of pixels relating to one or more objects that are within the scope of the camera 120 (e.g., a distance, say 500 meters, from the vehicle). In some embodiments, because objects surrounding the vehicle may constantly be changing, the camera 120 may constantly capture images relating to the objects surrounding the vehicle and transmit the images to the image receiving module 310 in real time.

Claim 15. It differs from claim 1 in that it is an image processing apparatus performing the method of claim 1. Therefore claim 15 has been analyzed and reviewed in the same way as claim 1. See the above analysis. 

Claim 16. It differs from claim 3 in that it is an image processing apparatus performing the method of claim 3. Therefore claim 16 has been analyzed and reviewed in the same way as claim 3. See the above analysis. 

Claim 18. Yao further teaches wherein the detected object includes another vehicle. Yao [0032] teaches the one or more LiDARs may be configured to obtain point sets including a plurality of points relating to objects (e.g., a person, an animal, a tree, a roadblock, building, or a vehicle) that are within the scope of the one or more LiDARs (e.g., a distance, say 500 meters, from the vehicle). A point in a point set may include 3D coordinates of the point and a reflection intensity of the point. .

Claim 19. Yao further teaches wherein the vehicle is an autonomous semi-trailer truck. 
Yao [0030] teaches FIG. 1 is a schematic diagram illustrating an exemplary environmental information detecting system 100 according to some embodiments of the present disclosure. The environmental information detecting system 100 may be configured to detect the information of the environmental around a vehicle (e.g., a driverless vehicle).

Yao [0032] teaches the one or more LiDARs may be configured to obtain point sets including a plurality of points relating to objects (e.g., a person, an animal, a tree, a roadblock, building, or a vehicle) that are within the scope of the one or more LiDARs (e.g., a distance, say 500 meters, from the vehicle). A point in a point set may include 3D coordinates of the point and a reflection intensity of the point. .

Claim 20. Yao further teaches wherein operations associated the determining the one or more positions, the determining the camera coordinates, the determining the second world coordinates, and the assigning the second world coordinates are configured to be performed in real-time while the vehicle is being driven. Yao [0056] teaches the image receiving module 310 may be configured to receive an image from the camera 120. The image may include a plurality of pixels relating to one or more objects that are within the scope of the camera 120 (e.g., a distance, say 500 meters, from the vehicle). In some embodiments, because objects surrounding the vehicle may constantly be changing, the camera 120 may constantly capture images relating to the objects surrounding the vehicle and transmit the images to the image receiving module 310 in real time.

Yao [0058] teaches because objects surrounding the vehicle may constantly be changing, the one or more LiDARs may constantly obtain point sets relating to the objects surrounding the vehicle and transmit the point sets to the point receiving module 320 in real time.

Yao [0070] teaches the camera 120 may constantly capture images and transmit the images to the processing device 130 in real time. The one or more LiDARs may constantly obtain points sets and transmit the point sets to the processing device 130 in real time. 


Claims 4 and 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2019/0138822 A1 to Yao et al., hereinafter “Yao” in view of US 2017/00185823 A1 to Gold et al., hereinafter, “Gold” and US 2011/0282578 A1 to Miksa et al., hereinafter, “Miksa” and in further view of US 2020/0082180 A1 to Wang.
Claim 4. The method of claim 1, wherein the bounding box includes a plurality of vertices located at a plurality of corners of the bounding box, wherein the one or more reference points includes a reference point located midpoint in between two vertices of the bounding box, and wherein the two vertices are closest to a surface of a road compared to other vertices of the bounding box. Wang [0052] teaches O.sub.i=[x.sub.top, y.sub.top, x.sub.bottom, y.sub.bottom, x.sub.3d,1, y.sub.3d,1, . . . , x.sub.3d,8, y.sub.3d,8, h, w, l, X, Y, Z, .theta.], [0053] Where xs and ys are the pixel value in the image plane; top and bottom denotes the top-left and bottom-right corners that defines the 2D bounding box; x.sub.3ds and y.sub.3ds are the eight vertices of the projected 3D bounding boxes on the 2D image plane. The remaining values are just the 3D properties of the bounding box, including its height (h), width (w), length (l), location in the 3D world relative to the camera (X, Y, Z), and the heading orientation of the bounding box (9).

Wang [0055] teaches In an example embodiment of the 3D image processing module 200, the deep learning module 212 is used for learning the projected 3D bounding boxes in the image plane. The fitting module 214 uses the output of the deep learning module 212 with the input of corresponding camera matrices and terrain map data to produce the 3D attributes of objects in an input set of images.
 TABLE-US-00001 3D bounding box fitting process: 1: procedure FITTING( image, bboxes): 2: Obtain camera extrinsic matrix T and intrinsic matrix K. 3: for each bbox in bboxes do 4: Obtain the terrain value v. 5: Set the origin to the bottom center of the bbox, get the coordinates of all eight points. 6: Transform the bbox to camera coordinates using T. 7: Project the eight points to the image plane using K. 8: Solve the fitting problem using the least square algorithm with the prior v. 9: end for 10: Return 3Dbboxes 11: end procedure

Wang [0003] teaches object detection is a fundamental problem for numerous vision tasks, including object tracking, semantic instance segmentation, and object behavioral prediction. Detecting all objects in a traffic environment, such as cars, buses, pedestrians, and bicycles, is crucial for building an autonomous driving system. Failure to detect an object (e.g., a car or a person) may lead to malfunction of the motion planning module of an autonomous driving car, thus resulting in a catastrophic accident. As such, object detection for autonomous vehicles is an important operational and safety issue. 

Wang [0004] teaches deep learning-based 2D object detection models have been successfully applied to a variety of computer vision tasks, including face detection, instance segmentation, point cloud processing, and autonomous driving. Given an input image, the goal of 2D object detection is to output the category label and the location (using a rectangular bounding box) of all objects of interest. However, because all operations are performed on the 2D image plane, conventional models can only get the relative location information (in pixels) rather than the absolute value (in meters). This behavior produced by conventional 2D models is not desired for a modern autonomous driving system, as losing the exact location (and potentially car dimensionality) significantly impairs the output quality of the perception module, thus impacting the execution of the subsequent motion planning and control modules and producing potential hazards. 

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious at the time the invention was made to one of ordinary skill in the art to combine the bounding box information while a vehicle is being driven by Yao, Gold and Miksa’s with Wang’s the bounding box includes a plurality of vertices located at a plurality of corners. One would have been motivated to perform this combination due to the fact that it allows one to accurately detect the exact location of an object or person in image data in vehicle vision environments to improve safety operation of an autonomous system (Wang [0003-0004]) at. In combination, Yao is not altered in that Yao continues to receive the bounding box information while a vehicle is being driven, and Godl continues to indicate world coordinates of an object. Miksa's teachings perform the same as they do separately of obtaining positional information of objects on earth surface (terrain map). While Wang continues to teach 3D object detection. 
Therefore one of ordinary skill in the art, such as an individual working in the field of object coordinates transformation in image data could have combined the elements as claimed by known methods, and that in combination, each element merely performs the same function as it does separately. It is for at least the aforementioned reasons that the Examiner has reached a conclusion of obviousness with respect to claim 4.

Claim 17. Wang further teaches wherein the bounding box includes a plurality of vertices located at a plurality of corners of the bounding box. Wang [0052] teaches O.sub.i=[x.sub.top, y.sub.top, x.sub.bottom, y.sub.bottom, x.sub.3d,1, y.sub.3d,1, . . . , x.sub.3d,8, y.sub.3d,8, h, w, l, X, Y, Z, .theta.], [0053] Where xs and ys are the pixel value in the image plane; top and bottom denotes the top-left and bottom-right corners that defines the 2D bounding box; x.sub.3ds and y.sub.3ds are the eight vertices of the projected 3D bounding boxes on the 2D image plane. The remaining values are just the 3D properties of the bounding box, including its height (h), width (w), length (l), location in the 3D world relative to the camera (X, Y, Z), and the heading orientation of the bounding box (9).

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2019/0138822 A1 to Yao et al., hereinafter “Yao” in view of US 2017/00185823 A1 to Gold et al., hereinafter, “Gold” and US 2011/0282578 A1 to Miksa et al., hereinafter, “Miksa” and US 2020/0311979 A1 to Chang et al., hereinafter, “Chang”.





Claim 9. Chang further teaches wherein a camera extrinsic matrix is used with the camera coordinates to determine the first world coordinates of the reference point. 
Chang [0004] teaches In order to allow accurate transformation of object position between the world coordinates and the image coordinates, camera calibration becomes an important issue in the field of computer vision. Camera calibration is configured to obtain the intrinsic parameters and the extrinsic parameters of the cameras.

Chang [0034] teaches the processor 130 receives an image captured by the camera 110, and two corresponding projection points are also provided in this image. The processor 130 analyzes the image according to the two reference points, so as to obtain image coordinates corresponding to the two projection points on the image coordinate system. Alternatively, the processor 130 may also obtain the image coordinates of the two projection points on the image coordinate system through configuration file or user input. Accordingly, the processor 130 may obtain the plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to the camera 110 through the world coordinates of the camera, the world coordinates of the two reference points, and the image coordinates of the corresponding projection points on the image coordinate system. The coordinate transformation parameters include the extrinsic parameters corresponding to transformation between the world coordinates and the camera coordinates and intrinsic parameters corresponding to transformation between the camera coordinates and the image coordinates. The processor 130 records the coordinate transformation parameters in the storage unit 120. Details of obtaining the plurality of coordinate transformation parameters relative to transformation between any image coordinates and any world coordinates corresponding to the camera through the world coordinates of the camera, the world coordinates of the two reference points, and the image coordinates of the corresponding projection points on the image coordinate system are to be described in the following paragraphs.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious at the time the invention was made to one of ordinary skill in the art to combine the bounding box information while a vehicle is being driven by Yao, Gold and Miksa’s with Chang’s a camera extrinsic matrix is used with the camera coordinates to determine the first world coordinates of the reference point. One would have been motivated to perform this combination due to the fact that it allows one to accurately detect the exact location of an object or person in image data in vehicle vision environments to improve safety operation of an autonomous system (Chang [0003-0004]) at. In combination, Yao is not altered in that Yao continues to receive the bounding box information while a vehicle is being driven, and Chang continues to indicate world coordinates of an object. Miksa's teachings perform the same as they do separately of obtaining positional information of objects on earth surface (terrain map). While Chang continues to indicate world coordinates of an object
Claims 10-11 is/are rejected under 35 U.S.C. 103 as being unpatentable over US 2019/0138822 A1 to Yao et al., hereinafter “Yao” in view of US 2017/00185823 A1 to Gold et al., hereinafter, “Gold” and US 2011/0282578 A1 to Miksa et al., hereinafter, “Miksa” and US 2018/0293445 A1 to Gao et al., hereinafter “Gao”.

Claim 10. Gao further teaches wherein the image is cropped from a second image received from a camera located on the vehicle. Gao [0091] teaches the two- and three-dimensional data images surroundings of the vehicle and may be 360.degree. surround imaging. The data is received through data receiving module 16. The three-dimensional data 14a may crop dimensions of an object due to reflectivity issues, or an object may be partly or wholly missed by the three-dimensional data 14a.

Claim 11. Gao further teaches wherein the image is cropped while the vehicle is being driven. Gao [0091] teaches the two- and three-dimensional data images surroundings of the vehicle and may be 360.degree. surround imaging. The data is received through data receiving module 16. The three-dimensional data 14a may crop dimensions of an object due to reflectivity issues, or an object may be partly or wholly missed by the three-dimensional data 14a.

Hence the prior art includes each element claimed, although not necessarily in a single prior art reference, with the only difference between the claimed invention and the prior art being the lack of actual combination of the elements in a single prior art reference. Thus, it would have been obvious at the time the invention was made to one of ordinary skill in the art to combine the bounding box information while a vehicle is being driven by Yao, Gold and Miksa’s with Gao’s the image is cropped from a second image received from a camera located on the vehicle. One would have been motivated to perform this combination due to the fact that it allows one to accurately detect the exact location of an object or person in image data in vehicle vision environments to improve safety operation of an autonomous system (Gao [0003-0004]) at. In combination, Yao is not altered in that Yao continues to receive the bounding box information while a vehicle is being driven, and Gao continues to indicate world coordinates of an object. 

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DELOMIA L GILLIARD whose telephone number is (571)272-1681.  The examiner can normally be reached on 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vincent Rudolph can be reached on 571 272-8243.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.







/DELOMIA L GILLIARD/Primary Examiner, Art Unit 2661