DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Allowable Subject Matter

Claims 3, 4, 10, 11, 18, and 19 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Prior Art

While not explicitly relied on or cited in the Rejection, the Examiner notes additional prior art believed to be pertinent or significant to the present application that has been included in the PTO-892 Notice of References Cited. Specifically U.S. Patent 11187793 to LIU. 

Claim Rejections under 35 U.S.C. § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 16 rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Publication 2017/0345161 to TAKATANI et al. (hereinafter “TAKATANI”) in view of U.S. Patent Publication 2020/0364508 to GUREL et al. (hereinafter “GUREL”).

Regarding Claim 1, TAKATANI discloses an apparatus for diagnosing an abnormality of a vehicle sensor (FIG. 1 is a block diagram showing an example of a configuration of an information processing apparatus 10 … The information processing apparatus 10 is used as a motion estimation apparatus for estimating a self-position of a moving object 1. The self-position means the position and the posture of moving object 1. The moving object 1 is, for example, a vehicle (automobile, two-wheeled vehicle, etc.), a ship, an airplane, or a drone. [¶ 0013]), the apparatus comprising:
a sensor configured to measure an acceleration and an angular velocity of a vehicle; a camera configured to generate a … time series image frame of the vehicle (The moving object 1 is provided with an inertial sensor 20 and an imaging apparatus 30. [¶ 0014] … the acceleration sensor 20A and the angular velocity sensor 20B may be collectively referred to as an inertial sensor 20. … acceleration components in the x, y, z directions and rotational speed components around the x, y, z axes are acquired as measurement values of the inertial sensor 20. [¶ 0015] … A visible light camera, an infrared light camera, an ultraviolet light camera, or the like may be used as the imaging apparatus 30. [¶ 0016] … The information processing apparatus 10 estimates the self-motion of the moving object 1 using measurement values obtained with the acceleration sensor 20A and the angular velocity sensor 20B and an image captured by the imaging apparatus 30. [¶ 0017]); and
a controller configured to estimate the acceleration and the angular velocity of the vehicle by using the front time series image frame generated by the camera and to diagnose an abnormality in the sensor based on the acceleration and the angular velocity of the vehicle estimated by the controller (The information processing apparatus 10 estimates the self-motion of the moving object 1 using measurement values obtained with the acceleration sensor 20A and the angular velocity sensor 20B and an image captured by the imaging apparatus 30. [¶ 0017] … It is known that drift occurs in the inertial sensor 20 due to error caused by vibration, heat, or the like. Therefore, in order to estimate the self-position of the moving object 1 using only the measurement value of the inertial sensor 20, even with a slight drift, the accumulation increases the error and accordingly, the error of the self-position increases. [¶ 0018] … In order to cope with such a phenomenon, the amount of drift of the inertial sensor 20 is considered to be estimated by separately estimating the self-motion of the moving object 1 by using an image captured by the imaging apparatus 30 and performing comparison with the self-motion of the moving object 1 estimated using the measurement value measured by the inertial sensor 20. [¶ 0019] … Thus, the information processing apparatus 10 estimates the self-motion of the moving object 1 by using the measurement value measured by the inertial sensor 20 and the image captured by the imaging apparatus 30, and estimates the amount of drift of the inertial sensor 20. [¶ 0020])

While TAKATANI discloses a camera configured to generate a … time series image frame of the vehicle (The moving object 1 is provided with an inertial sensor 20 and an imaging apparatus 30. [¶ 0014]), TAKATANI does not explicitly disclose, or is not relied on to disclose a camera configured to generate a front time series image frame of the vehicle.
However, in the same field of endeavor GUREL teaches a camera configured to generate a front time series image frame of the vehicle (In order to infer such a navigation path, the vehicle will often capture a stream of video data from at least one camera (or other sensor) on the vehicle. The stream of video data can include a series of frames or image portions that include a representation of a portion of the environment in which the vehicle is navigating. FIG. 1B illustrates one such image or video frame 150 that might be captured from a camera on the front of such a vehicle. As illustrated, there are many different types of objects within the video frame, and it can be necessary in at least some systems to identify various types of objects and determine their relative location to, or distance from, the vehicle. [¶ 0046] … A sequential model … has the potential to exploit similarities between consecutive frames of a sequence. A sequential model can take the form of, for example, a DNN-based feature extractor that is enhanced by a recurrent neural network (RNN)-based temporal processor. [¶ 0048] … The network can be any appropriate network, such as a recurrent neural network (RNN) or convolutional neural network (CNN), among other such options. [¶ 0070])
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI with that of GUREL for advantage of various computer vision-based approaches can be used to identify the various objects in an image frame. At least some of these approaches can utilize machine learning to infer the various types of objects represented in the image, as well as their relative position or coordinates … many object and path detectors in autonomous driving are based on feed-forward Deep Neural Networks (DNN). These detectors operate frame by frame, performing an inference for each frame independently without using any information from other recently-processed frames. (GUREL: ¶¶ 0047 and 0048)

Regarding Claim 16, the features of Claim 16 are essentially the same as Claim 1 with the apparatus of Claim 1 performing the Method of Claim 16. Therefore, Claim 16 is rejected on the same grounds and motivation as Claim 1.

Claims 2, 5, 6, 9, 12, 13, 17, and 20 rejected under 35 U.S.C. 103 as being unpatentable over TAKATANI in view of GUREL and U.S. Patent Publication 2020/0363815 to MOUSAVIAN et al. (hereinafter “MOUSAVIAN”).

Regarding Claim 2, the combination of TAKATANI and GUREL teaches the apparatus of claim 1.
TAKATANI further teaches wherein the controller comprises:
a first pose network configured to predict first pose information of the camera by using an image at a time point of t-1 and an image at a time point of t; a second pose network configured to predict second pose information of the camera by using an image at a time point of t-2 and the image at the time point of t-1 (information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile … In FIG. 5, the horizontal axis represents the frame number of image. [0070]. The Examiner notes that: 1) consistent with e.g., ¶ 0053 of the present published Specification, the pose network(s) are may be simply hardware/software and/or internal organization of the controller; and 2) there is no claim or requirement as to network functionality as would be understood by a Person Of Skill In The Art (POSITA) or how the functionality of the first network and the second network may differ (i.e., that the first network and the second network are different, or required to be different.)

While TAKATANI does not explicitly disclose, or is not relied on to disclose, in the same field of endeavor MOUSAVIAN teaches:
a 6-dimension (6D) estimation network configured to estimate the acceleration and the angular velocity of the vehicle by using the first pose information and the second pose information; and a diagnostic device configured to compare the acceleration and the angular velocity of the vehicle estimated by the 6D estimation network with the acceleration and the angular velocity of the vehicle measured by the sensor to diagnose the abnormality in the sensor (the system produces a 6D pose estimation in the form of a distribution refined over a plurality of ordered images, such as frames of a video stream. In an embodiment, in various fields such as robotics, temporal data and information about the uncertainty of estimates is very important for tasks such as grasp planning or active sensing. In an embodiment, temporal tracking in video data can improve pose estimation in the context of point-cloud based pose estimation, and provide an improvement over unimodal estimates that fail to adequately represent the complex uncertainties arising from occlusions and possible object symmetries. [¶ 0025])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN for advantage of Object recognition and tracking is an important problem in the field of image processing. … a 3-D position and a corresponding 3-D orientation of an object (called the 6D pose) may be determined from an image. … the 6D pose of the object may be estimated using local-feature or template-matching techniques, where features extracted from an image are matched against features or viewpoint templates generated for the 3-D model of the object. (MOUSAVIAN: ¶ 0001)

Regarding Claim 5, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 6, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
MOUSAVIAN further teaches:
wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 9, the combination of TAKATANI and GUREL teaches the apparatus of claim 1.
TAKATANI further teaches wherein the controller comprises:
wherein the controller includes:
a pose network configured to predict first pose information of the camera by using an image at a time point of t-1 and an image at a time point of t and second pose information of the camera by using an image at a time point of t-2 and the image at the time point of t-1 (information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile … In FIG. 5, the horizontal axis represents the frame number of image. [0070])

While TAKATANI does not explicitly disclose, or is not relied on to disclose, in the same field of endeavor MOUSAVIAN teaches:
a 6D estimation network configured to estimate the acceleration and the angular velocity of the vehicle by using the first pose information and the second pose information; and a diagnostic device configured to compare the acceleration and the angular velocity of the vehicle estimated by the 6D estimation network with the acceleration and the angular velocity of the vehicle measured by the sensor to diagnose the abnormality in the sensor (the system produces a 6D pose estimation in the form of a distribution refined over a plurality of ordered images, such as frames of a video stream. In an embodiment, in various fields such as robotics, temporal data and information about the uncertainty of estimates is very important for tasks such as grasp planning or active sensing. In an embodiment, temporal tracking in video data can improve pose estimation in the context of point-cloud based pose estimation, and provide an improvement over unimodal estimates that fail to adequately represent the complex uncertainties arising from occlusions and possible object symmetries. [¶ 0025])

Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 12, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 13, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
MOUSAVIAN further teaches:
wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 17, the features of Claim 17 are essentially the same as Claim 2 with the apparatus of Claim 1 performing the Method of Claim 16. Therefore, Claim 16 is rejected on the same grounds and motivation as Claim 1.

Regarding Claim 20, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the method of claim 17.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information and wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Claims 7 and 14 rejected under 35 U.S.C. 103 as being unpatentable over TAKATANI in view of GUREL, MOUSAVIAN, and U.S. Patent Publication 2021/0213973 to PENA et al. (hereinafter “PENA”).

Regarding Claim 7, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor PENA teaches:
wherein the first and second pose networks are convolutional neural networks (CNNs) (Visual-Inertial Odometry (VIO) systems take advantage of visual and inertial information to provide position and orientation estimations. In some implementations, the visual-inertial data fusion is done by using probabilistic filter approaches such as Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF). The visual-inertial data fusion is performed with an EKF based system, which they used to compare different fusion models using only gyroscope data, or gyroscope and accelerometer data. Other variations of the EKF may be implemented, such as the Multi-state Constraint Kalman Filter (MSCKF), such as systems where several past camera poses are used to detect static features and add a constraint to the state vector, among other examples. [¶ 0121] … VIO approaches based on probabilistic filters for sensor fusion may require a hard and complex calibration process in order to bring camera and inertial measurement unit (IMU) measurements to the same reference coordinate system. [¶ 0123] … In addition to the example features above, an improved computer vision system may additional utilize Convolutional Neural Networks (CNNs) to estimate the relative pose between a pair of point clouds using a sparse hierarchical voxel grid representation (e.g., VOLA). [¶ 0138])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of PENA for advantage where encoded features may be combined in a feature vector with features extracted from a CNN, being this feature vector tracked over time by a second LSTM, which may provide a pose estimation of a robot. (PENA: ¶ 0123)

Regarding Claim 14, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor PENA teaches:
wherein the pose network is a convolutional neural networks (CNNs) (Visual-Inertial Odometry (VIO) systems take advantage of visual and inertial information to provide position and orientation estimations. In some implementations, the visual-inertial data fusion is done by using probabilistic filter approaches such as Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF). The visual-inertial data fusion is performed with an EKF based system, which they used to compare different fusion models using only gyroscope data, or gyroscope and accelerometer data. Other variations of the EKF may be implemented, such as the Multi-state Constraint Kalman Filter (MSCKF), such as systems where several past camera poses are used to detect static features and add a constraint to the state vector, among other examples. [¶ 0121] … VIO approaches based on probabilistic filters for sensor fusion may require a hard and complex calibration process in order to bring camera and inertial measurement unit (IMU) measurements to the same reference coordinate system. [¶ 0123] … In addition to the example features above, an improved computer vision system may additional utilize Convolutional Neural Networks (CNNs) to estimate the relative pose between a pair of point clouds using a sparse hierarchical voxel grid representation (e.g., VOLA). [¶ 0138])

Claims 8 and 15 rejected under 35 U.S.C. 103 as being unpatentable over TAKATANI in view of GUREL, MOUSAVIAN and U.S. Patent Publication 2019/0258251 to DITTY et al. (hereinafter “DITTY”).

Regarding Claim 8, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor DITTY teaches:
wherein the 6D estimation network includes a multi-layer perceptron network (FIG. 5 is diagram of example camera types and locations on a vehicle. [¶ 0055] Controller (100) provides autonomous driving outputs in response to an array of sensor inputs including, for example: one or more ultrasonic sensors (66), one or more RADAR sensors (68), one or more Light Detection and Ranging (“LIDAR”) sensors (70), one or more surround cameras (72) (typically such cameras are located at various places on vehicle body (52) to image areas all around the vehicle body), one or more stereo cameras (74) (in preferred embodiments, at least one such stereo camera faces forward to provide depth-perception for object detection and object recognition in the vehicle path), one or more infrared cameras (75) … an inertial sensor or inertial measurement unit (“IMU”) (82) that monitors movement of vehicle body (52) (this sensor can be for example an accelerometer(s) and/or a gyro-sensor(s) and/or a magnetic compass(es)). [¶ 0124] … self-driving vehicle (50) includes a plurality of cameras (72, 73, 74, 75, 76), capturing images around the entire periphery of the vehicle. [¶ 0129 illustrated in Fig. 5). [¶ 0129] … A variety of different IMU sensors may be used … embodiments may include six-axis applications (accelerometers and gyroscopes) and nine-axis applications (accelerometers, gyroscopes, and magnetometers). [¶ 0163] … The neural network can take as its input at least some subset of parameters, such as (1) bounding box dimensions, (2) ground plane estimate obtained e.g. from another subsystem, (3) inertial measurement unit (IMU) output that correlates with the ego-vehicle orientation, (4) distance or 3D location estimates of the object obtained from the neural network and/or other sensors, such as LIDAR or RADAR, among others. [¶ 0225] … One implementation could be a multi-layer perceptron (“MLP”) comprising one or more hidden layers. [¶ 0226])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of DITTY for advantage of an end-to-end platform with a flexible architecture that provides a comprehensive functional safety architecture that leverages and makes efficient use of computer vision and/or ADAS techniques for diversity and redundancy, provides a platform for a flexible, reliable driving software stack, along with deep learning tools.(DITTY: ¶ 0050)

Regarding Claim 15, the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor DITTY teaches:
wherein the 6D estimation network includes a multi-layer perceptron network (FIG. 5 is diagram of example camera types and locations on a vehicle. [¶ 0055] Controller (100) provides autonomous driving outputs in response to an array of sensor inputs including, for example: one or more ultrasonic sensors (66), one or more RADAR sensors (68), one or more Light Detection and Ranging (“LIDAR”) sensors (70), one or more surround cameras (72) (typically such cameras are located at various places on vehicle body (52) to image areas all around the vehicle body), one or more stereo cameras (74) (in preferred embodiments, at least one such stereo camera faces forward to provide depth-perception for object detection and object recognition in the vehicle path), one or more infrared cameras (75) … an inertial sensor or inertial measurement unit (“IMU”) (82) that monitors movement of vehicle body (52) (this sensor can be for example an accelerometer(s) and/or a gyro-sensor(s) and/or a magnetic compass(es)). [¶ 0124] … self-driving vehicle (50) includes a plurality of cameras (72, 73, 74, 75, 76), capturing images around the entire periphery of the vehicle. [¶ 0129 illustrated in Fig. 5). [¶ 0129] … A variety of different IMU sensors may be used … embodiments may include six-axis applications (accelerometers and gyroscopes) and nine-axis applications (accelerometers, gyroscopes, and magnetometers). [¶ 0163] … The neural network can take as its input at least some subset of parameters, such as (1) bounding box dimensions, (2) ground plane estimate obtained e.g. from another subsystem, (3) inertial measurement unit (IMU) output that correlates with the ego-vehicle orientation, (4) distance or 3D location estimates of the object obtained from the neural network and/or other sensors, such as LIDAR or RADAR, among others. [¶ 0225] … One implementation could be a multi-layer perceptron (“MLP”) comprising one or more hidden layers. [¶ 0226])

Motivation to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of DITTY given in Claim 8 above.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERNEST G TACSIK whose telephone number is (571)270-1279.  The examiner can normally be reached on 9-6 Eastern Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kathy WANG-HURST can be reached on 571-270-5371.  The fax phone number for the organization where this application or proceeding is assigned is 571- 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ERNEST G TACSIK/           Examiner, Art Unit 2644