DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment

The Amendment filed 12MAY2016 has been entered. Claims 17 – 26, 28 - 32 are currently pending in the application. 

Response to Argument

Applicant's arguments filed 10AUG2022 have been fully considered but they are not persuasive:

Argument 1: Prior Art Rejections: “Applicant can find no suggestion that Takatani teaches both a first pose network and a second pose network as required by the claim. The Office action notes that there is no requirement that the first and second networks differ. Applicant agrees. … The Office action provides no rationale as to how it could be obvious to replace the pose network with first and second pose networks. … Claim 2 requires three elements: a first pose network, a second pose network, and a 6-dimension (6D) estimation network. None of the prior art, however, suggests uses any more than one such network, let alone all three. 
Response 1: the Examiner respectfully disagrees. It is first noted that, while the term “network” is used in the present application/claims to describe both indistinguishable “pose networks” and an “estimation network”, in context, the term “networks” is not used in the common sense of a “communications network,” but, as interpreted, simply used to describe a functionality of a microprocessor/CPU/controller executing instructions (e.g., a software routine, process, or algorithm)(see, e.g., ¶ 0064 immediately below). Furthermore, and as supported by the present Specification (see immediately below), while differentiated in identification/name only, it is ambiguous as to any difference – particularly any structural difference - between the “first” and ”second” “pose networks.” It is ambiguous as to whether it is claimed that an identical instruction set (i.e., an algorithm) is, e.g., saved in memory twice (e.g., in two different memory locations) and simply identified by different names - OR that each iteration/pass of a single saved algorithm is simply being uniquely identified by name (i.e., a .“first pose networks” being reference to a first iteration/pass of the algorithm).
Recitations from the present published Specification include:
“The first and second pose networks may be convolutional neural networks (CNNs), and the 6D estimation network may be a multi-layer perceptron network.” [¶ 0016]

“As shown in FIG. 2, the controller 50 provided in the apparatus … may include a first pose network 51, a second pose network 52, a 6D estimation network 54 … In this case, the first and second pose networks 51 and 52 have the same structure and the same parameters, and perform the same function.” [¶ 0058]

“The controller 50 may be implemented in the form of hardware or software, or may be implemented in the form of a combination of hardware and software. Preferably, the controller 50 may be implemented with a microprocessor.” [¶ 0053]

As shown in FIG. 3, the pose network 53, which is a module that performs both the operation of the first pose network 51 and the operation of the second pose network 52. [¶ 0064]

This interpretation of “network” was explicitly noted in the 13MAY2022 Rejection: “1) consistent with e.g., ¶ 0053 of the present published Specification, the pose network(s) are may be simply hardware/software and/or internal organization of the controller; and 2) there is no claim or requirement as to network functionality as would be understood by a Person Of Skill In The Art (POSITA) or how the functionality of the first network and the second network may differ (i.e., that the first network and the second network are different, or required to be different.).” 
That is, the argued “three elements: a first pose network, a second pose network, and a 6-dimension (6D) estimation network,” are all interpreted as e.g., functionality of microprocessor executing instructions. Rejection of the claim, as interpreted, requires 1) a microprocessor/CPU/controller able to execute the required functionality (i.e., algorithm), and 2) that the microprocessor/CPU/controller is able to execute multiple functionalities (e.g., multiple algorithms).
As cited in the Rejection one or more of the (unchanged) cited arts in combination teaches the argued “three elements: a first pose network, a second pose network, and a 6-dimension (6D) estimation network.” It can be noted that none of the cited art places any limit as to the number of algorithms that can be executed on a, microprocessor. While the Applicant’s arguments are not seen as persuasive in overcoming the cited art, in response to the arguments, the following recitations from the cited arts include:

TAKATANI discloses:
1) a microprocessor/CPU/controller able to execute the required functionality (i.e., algorithm)
The information processing apparatus 10 is, for example, a dedicated or general-purpose computer, and includes a processing circuit 100, a storage circuit 200. [¶ 0021]

2) that the microprocessor/CPU/controller is able to execute multiple functionalities (e.g., multiple algorithms).
The processing circuit 100 includes an acquisition function 110, a correction function 120, a detection function 130, a search function 140, a motion estimation function 150, and a drift estimation function 160. … Each of these processing functions is stored in the storage circuit 200 in the form of a computer executable program. The processing circuit 100 is a processor that realizes a processing function corresponding to each program by reading and executing a program from the storage circuit 200. [¶ 0022]

GUREL teaches:
1) a microprocessor/CPU/controller able to execute the required functionality (i.e., algorithm)
FIG. 27 illustrates a deep learning application processor 2700 … deep learning application processor 2700 uses instructions that, if executed by deep learning application processor 2700, cause deep learning application processor 2700 to perform some or all of processes and techniques described throughout this disclosure. [¶ 0315]

2) that the microprocessor/CPU/controller is able to execute multiple functionalities (e.g., multiple algorithms).
deep learning application processor 2700 is used to infer or predict information based on a trained machine learning model (e.g., neural network). [¶ 0318] … “processor” may refer to any device or portion of a device that processes electronic data from registers and/or memory and transform that electronic data into other electronic data that may be stored in registers and/or memory. … “processor” may be a CPU or a GPU. … “software” processes may include, for example, software and/or hardware entities that perform work over time, such as tasks, threads, and intelligent agents. Also, each process may refer to multiple processes, for carrying out instructions in sequence or in parallel, continuously or intermittently. [¶ 0410]

MOUSAVIAN teaches:
1) a microprocessor/CPU/controller able to execute the required functionality (i.e., algorithm)
FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object  [¶ 0034] … the process illustrated in FIG. 3 is performed by a computer system configured with executable instructions. … the computer system includes one or more processors … a computer system executes instructions that cause the computer system to determine a 6D pose of an object. [¶ 0036] … a process such as those processes described herein (or variations and/or combinations thereof) is performed under the control of one or more computer systems configured with executable instructions and is implemented as code (e.g., executable instructions, one or more computer programs or one or more applications) executing collectively on one or more processors, by hardware or combinations thereof. [¶ 0246] … the term “processor” may refer to any device or portion of a device that processes electronic data from registers and/or memory and transform that electronic data into other electronic data that may be stored in registers and/or memory. As non-limiting examples, “processor” may be a Central Processing Unit (CPU) or a Graphics Processing Unit (GPU). [¶ 0253]

2) that the microprocessor/CPU/controller is able to execute multiple functionalities (e.g., multiple algorithms).
FIG. 12 illustrates a parallel processing unit (“PPU”) 1200 … PPU 1200 is configured with machine-readable code that, if executed by the PPU, causes the PPU to perform some or all of processes and techniques described throughout this disclosure. … the PPU 1200 is a multi-threaded processor … designed to process computer-readable instructions (also referred to as machine-readable instructions or simply instructions) on multiple threads in parallel. [¶ 0080] … each process may refer to multiple processes, for carrying out instructions in sequence or in parallel, continuously or intermittently. [¶ 0253]


Based on the claim requirements as interpreted (above), the Examiner maintains the Rejections (as well as the noted Allowable Subject Matter) as per below:

Allowable Subject Matter

Claims 3, 4, 10, 11, 18, and 19 objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Prior Art

While not explicitly relied on or cited in the Rejection, the Examiner notes additional prior art believed to be pertinent or significant to the present application that has been included in the PTO-892 Notice of References Cited. Specifically:
U.S. Patent 10297070 to ZHU et al. 
U.S. Patent Publication 2021/0065241 to CHAKRAVARTY
U.S. Patent Publication 2020/0302634 to POLLEFEYS et al.
U.S. Patent Publication 2021/0004983 to FISCHER et al.
U.S. Patent Publication 2020/0160070 to SHOLINGAR et al.
U.S. Patent Publication 2017/0168586 to SINHA et al.

Claim Rejections under 35 U.S.C. § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claims 2, 5, 6, 9, 12, 13, 17, and 20 - 22 rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent Publication 2017/0345161 to TAKATANI et al. (hereinafter “TAKATANI”) in view of U.S. Patent Publication 2020/0364508 to GUREL et al. (hereinafter “GUREL”) and U.S. Patent Publication 2020/0363815 to MOUSAVIAN et al. (hereinafter “MOUSAVIAN”).

Regarding Claim 2, TAKATANI discloses an apparatus for diagnosing an abnormality of a vehicle sensor (FIG. 1 is a block diagram showing an example of a configuration of an information processing apparatus 10 … The information processing apparatus 10 is used as a motion estimation apparatus for estimating a self-position of a moving object 1. The self-position means the position and the posture of moving object 1. The moving object 1 is, for example, a vehicle (automobile, two-wheeled vehicle, etc.), a ship, an airplane, or a drone. [¶ 0013]), the apparatus comprising:
a sensor configured to measure an acceleration and an angular velocity of a vehicle; a camera configured to generate a … time series image frame of the vehicle (The moving object 1 is provided with an inertial sensor 20 and an imaging apparatus 30. [¶ 0014] … the acceleration sensor 20A and the angular velocity sensor 20B may be collectively referred to as an inertial sensor 20. … acceleration components in the x, y, z directions and rotational speed components around the x, y, z axes are acquired as measurement values of the inertial sensor 20. [¶ 0015] … A visible light camera, an infrared light camera, an ultraviolet light camera, or the like may be used as the imaging apparatus 30. [¶ 0016] … The information processing apparatus 10 estimates the self-motion of the moving object 1 using measurement values obtained with the acceleration sensor 20A and the angular velocity sensor 20B and an image captured by the imaging apparatus 30. [¶ 0017]); and
a controller configured to estimate the acceleration and the angular velocity of the vehicle by using the front time series image frame generated by the camera and to diagnose an abnormality in the sensor based on the acceleration and the angular velocity of the vehicle estimated by the controller (The information processing apparatus 10 estimates the self-motion of the moving object 1 using measurement values obtained with the acceleration sensor 20A and the angular velocity sensor 20B and an image captured by the imaging apparatus 30. [¶ 0017] … It is known that drift occurs in the inertial sensor 20 due to error caused by vibration, heat, or the like. Therefore, in order to estimate the self-position of the moving object 1 using only the measurement value of the inertial sensor 20, even with a slight drift, the accumulation increases the error and accordingly, the error of the self-position increases. [¶ 0018] … In order to cope with such a phenomenon, the amount of drift of the inertial sensor 20 is considered to be estimated by separately estimating the self-motion of the moving object 1 by using an image captured by the imaging apparatus 30 and performing comparison with the self-motion of the moving object 1 estimated using the measurement value measured by the inertial sensor 20. [¶ 0019] … Thus, the information processing apparatus 10 estimates the self-motion of the moving object 1 by using the measurement value measured by the inertial sensor 20 and the image captured by the imaging apparatus 30, and estimates the amount of drift of the inertial sensor 20. [¶ 0020])
wherein the controller comprises:
a first pose network configured to predict first pose information of the camera by using an image at a time point of t-1 and an image at a time point of t; a second pose network configured to predict second pose information of the camera by using an image at a time point of t-2 and the image at the time point of t-1 (information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile … In FIG. 5, the horizontal axis represents the frame number of image. [¶ 0070]. The Examiner notes that: 1) consistent with e.g., ¶ 0053 of the present published Specification, the pose network(s) are may be simply hardware/software and/or internal organization of the controller; and 2) there is no claim or requirement as to network functionality as would be understood by a Person Of Skill In The Art (POSITA) or how the functionality of the first network and the second network may differ (i.e., that the first network and the second network are different, or required to be different.))

While TAKATANI discloses a camera configured to generate a … time series image frame of the vehicle (The moving object 1 is provided with an inertial sensor 20 and an imaging apparatus 30. [¶ 0014] … A visible light camera, an infrared light camera, an ultraviolet light camera, or the like may be used as the imaging apparatus 30. [¶ 0016] … The search function 140 searches (identifies) a corresponding point corresponding to a feature point detected from an image (first image) acquired at a certain timing from an image (second image) acquired at a timing different from the timing at which the image was acquired. [¶ 0030] … FIG. 5 illustrates an example of an error when the information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile provided with an inertial measurement unit (IMU) and an imaging apparatus and measurement values of the acceleration and the angular velocity. [¶ 0070]), TAKATANI does not explicitly disclose, or is not relied on to disclose a camera configured to generate a front time series image frame of the vehicle.
However, in the same field of endeavor GUREL teaches a camera configured to generate a front time series image frame of the vehicle (In order to infer such a navigation path, the vehicle will often capture a stream of video data from at least one camera (or other sensor) on the vehicle. The stream of video data can include a series of frames or image portions that include a representation of a portion of the environment in which the vehicle is navigating. FIG. 1B illustrates one such image or video frame 150 that might be captured from a camera on the front of such a vehicle. As illustrated, there are many different types of objects within the video frame, and it can be necessary in at least some systems to identify various types of objects and determine their relative location to, or distance from, the vehicle. [¶ 0046] … A sequential model … has the potential to exploit similarities between consecutive frames of a sequence. A sequential model can take the form of, for example, a DNN-based feature extractor that is enhanced by a recurrent neural network (RNN)-based temporal processor. [¶ 0048] … The network can be any appropriate network, such as a recurrent neural network (RNN) or convolutional neural network (CNN), among other such options. [¶ 0070])
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI with that of GUREL for advantage of various computer vision-based approaches can be used to identify the various objects in an image frame. At least some of these approaches can utilize machine learning to infer the various types of objects represented in the image, as well as their relative position or coordinates … many object and path detectors in autonomous driving are based on feed-forward Deep Neural Networks (DNN). These detectors operate frame by frame, performing an inference for each frame independently without using any information from other recently-processed frames. (GUREL: ¶¶ 0047 and 0048)
While the combination of TAKATANI and GUREL does not explicitly teach, or is not relied on to teach, in the same field of endeavor MOUSAVIAN teaches:
a 6-dimension (6D) estimation network configured to estimate the acceleration and the angular velocity of the vehicle by using the first pose information and the second pose information; and a diagnostic device configured to compare the acceleration and the angular velocity of the vehicle estimated by the 6D estimation network with the acceleration and the angular velocity of the vehicle measured by the sensor to diagnose the abnormality in the sensor (the system produces a 6D pose estimation in the form of a distribution refined over a plurality of ordered images, such as frames of a video stream. In an embodiment, in various fields such as robotics, temporal data and information about the uncertainty of estimates is very important for tasks such as grasp planning or active sensing. In an embodiment, temporal tracking in video data can improve pose estimation in the context of point-cloud based pose estimation, and provide an improvement over unimodal estimates that fail to adequately represent the complex uncertainties arising from occlusions and possible object symmetries. [¶ 0025])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN for advantage of Object recognition and tracking is an important problem in the field of image processing. … a 3-D position and a corresponding 3-D orientation of an object (called the 6D pose) may be determined from an image. … the 6D pose of the object may be estimated using local-feature or template-matching techniques, where features extracted from an image are matched against features or viewpoint templates generated for the 3-D model of the object. (MOUSAVIAN: ¶ 0001)

Regarding Claim 5 (Original),  the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 6 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
MOUSAVIAN further teaches:
wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 9, (Currently amended), TAKATANI discloses an apparatus for diagnosing an abnormality of a vehicle sensor (FIG. 1 is a block diagram showing an example of a configuration of an information processing apparatus 10 according to the present embodiment. The information processing apparatus 10 is used as a motion estimation apparatus for estimating a self-position of a moving object 1. The self-position means the position and the posture of moving object 1. The moving object 1 is, for example, a vehicle (automobile, two-wheeled vehicle, etc.), a ship, an airplane, or a drone. [¶ 0013]), the apparatus comprising:
a sensor configured to measure an acceleration and an angular velocity of a vehicle (An acceleration sensor, an angular velocity sensor, a posture sensor, a direction sensor, a velocity sensor, or the like may be used as the inertial sensor 20. [¶ 0015]);
a controller configured to estimate the acceleration and the angular velocity of the vehicle by using the front time series image frame generated by the camera and to diagnose an abnormality in the sensor based on the acceleration and the angular velocity of the vehicle estimated by the controller (The information processing apparatus 10 estimates the self-motion of the moving object 1 using measurement values obtained with the acceleration sensor 20A and the angular velocity sensor 20B and an image captured by the imaging apparatus 30. [¶ 0017] … the amount of drift of the inertial sensor 20 is considered to be estimated by separately estimating the self-motion of the moving object 1 by using an image captured by the imaging apparatus 30 and performing comparison with the self-motion of the moving object 1 estimated using the measurement value measured by the inertial sensor 20. However, the result of the self-motion estimation using only the image captured by the imaging apparatus 30 is not always reliable, and it is known that the error increases in situations where feature points cannot be detected from the captured image. When the amount of drift of the inertial sensor 20 is estimated by using a self-motion estimation result including a large error, the error of the drift estimation result also becomes large. For example, in a situation where a self-motion estimation using an image captured by the imaging apparatus 30 is difficult (a situation where it is difficult to detect a feature point from a captured image) such as when a moving object 1 is a car and traveling in a tunnel, the drift correction is performed by using the amount of drift estimated including a large error in the case of the above measures, so that the error of the final self-position is large. [¶ 0019] … the information processing apparatus 10 estimates the self-motion of the moving object 1 by using the measurement value measured by the inertial sensor 20 and the image captured by the imaging apparatus 30, and estimates the amount of drift of the inertial sensor 20. Therefore, even in a situation where it is difficult to perform the self-motion estimation by using the image captured by the imaging apparatus 30, an increase in the error of the drift estimation is suppressed, and the error of the self-position estimation can be reduced. [¶ 0020] … As described above, the information processing apparatus 10 estimates the self-motion of the moving object 1 and the amount of drift of the inertial sensor 20, by using the measurement value measured by the inertial sensor 20 and the image captured by the imaging apparatus 30, and based on the likelihood, and thus, even in a situation where it is impossible to perform the motion estimation with the image captured by the imaging apparatus 30, the error of the drift estimation is prevented from increasing, and the error of the self-position estimation can be reduced. [¶ 0071])
wherein the controller comprises:
a pose network configured to predict first pose information of the camera by using an image at a time point of t-1 and an image at a time point of t and second pose information of the camera by using an image at a time point of t-2 and the image at the time point of t-1 (information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile … In FIG. 5, the horizontal axis represents the frame number of image. [0070])

While TAKATANI discloses a camera configured to generate a … time series image frame of the vehicle ((The moving object 1 is provided with an inertial sensor 20 and an imaging apparatus 30. [¶ 0014] … A visible light camera, an infrared light camera, an ultraviolet light camera, or the like may be used as the imaging apparatus 30. [¶ 0016] … The search function 140 searches (identifies) a corresponding point corresponding to a feature point detected from an image (first image) acquired at a certain timing from an image (second image) acquired at a timing different from the timing at which the image was acquired. [¶ 0030] … FIG. 5 illustrates an example of an error when the information processing apparatus 10 performs motion estimation by using image sequences obtained during cruise of an automobile provided with an inertial measurement unit (IMU) and an imaging apparatus and measurement values of the acceleration and the angular velocity. [¶ 0070]), TAKATANI does not explicitly disclose, or is not relied on to disclose a camera configured to generate a front time series image frame of the vehicle.
However, in the same field of endeavor GUREL teaches a camera configured to generate a front time series image frame of the vehicle (In order to infer such a navigation path, the vehicle will often capture a stream of video data from at least one camera (or other sensor) on the vehicle. The stream of video data can include a series of frames or image portions that include a representation of a portion of the environment in which the vehicle is navigating. FIG. 1B illustrates one such image or video frame 150 that might be captured from a camera on the front of such a vehicle. As illustrated, there are many different types of objects within the video frame, and it can be necessary in at least some systems to identify various types of objects and determine their relative location to, or distance from, the vehicle. [¶ 0046] … A sequential model … has the potential to exploit similarities between consecutive frames of a sequence. A sequential model can take the form of, for example, a DNN-based feature extractor that is enhanced by a recurrent neural network (RNN)-based temporal processor. [¶ 0048] … The network can be any appropriate network, such as a recurrent neural network (RNN) or convolutional neural network (CNN), among other such options. [¶ 0070])
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI with that of GUREL for advantage of various computer vision-based approaches can be used to identify the various objects in an image frame. At least some of these approaches can utilize machine learning to infer the various types of objects represented in the image, as well as their relative position or coordinates … many object and path detectors in autonomous driving are based on feed-forward Deep Neural Networks (DNN). These detectors operate frame by frame, performing an inference for each frame independently without using any information from other recently-processed frames. (GUREL: ¶¶ 0047 and 0048)
Motivation to combine the teaching of TAKATANI  with that of GUREL given in Claim 2 above.
While the combination of TAKATANI and GUREL does not explicitly teach, or is not relied on to teach, in the same field of endeavor MOUSAVIAN teaches:
a 6D estimation network configured to estimate the acceleration and the angular velocity of the vehicle by using the first pose information and the second pose information; and a diagnostic device configured to compare the acceleration and the angular velocity of the vehicle estimated by the 6D estimation network with the acceleration and the angular velocity of the vehicle measured by the sensor to diagnose the abnormality in the sensor (the system produces a 6D pose estimation in the form of a distribution refined over a plurality of ordered images, such as frames of a video stream. In an embodiment, in various fields such as robotics, temporal data and information about the uncertainty of estimates is very important for tasks such as grasp planning or active sensing. In an embodiment, temporal tracking in video data can improve pose estimation in the context of point-cloud based pose estimation, and provide an improvement over unimodal estimates that fail to adequately represent the complex uncertainties arising from occlusions and possible object symmetries. [¶ 0025])

Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 12 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 13 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
MOUSAVIAN further teaches:
wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 17 (Currently Amended), the features of Claim 17 are essentially the same as Claim 2 with the apparatus of Claim 2 performing the Method of Claim 17. Therefore, Claim 17 is rejected on the same grounds and motivation as Claim 2.

Regarding Claim 20 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the method of claim 17.
MOUSAVIAN further teaches:
wherein the first pose information includes rotation information and location information and wherein the second pose information includes rotation information and location information (FIG. 2 illustrates an example of a process 200 that determines the 6D pose of an object by combining a 3-D position and a 3-D rotation … camera 202 captures an image 204 of an object to be manipulated by a robot. … the image 204 may be a frame of a video stream or a single image captured with a camera. … the image is provided to a particle filter 206 which estimates the 3-D position 208 of the object. … the particle filter 206 uses a previous image or previous frame of the video stream in addition to the image 204 to inform the estimation process. [¶ 0034]. The Examiner notes that: 1) no means for determining a physical/geographic location (e.g., via multilateration, GNSS/GPS coordinates, etc.) is found in the present disclosure; and 2) there is limitation as to the number of pose informations.)

 Motivation to combine the teaching of TAKATANI and GUREL with that of MOUSAVIAN given in Claim 2 above.

Regarding Claim 21 (New), the features of Claim 21 are essentially the same as Claim 5 with the method of claim 17 being performed by the apparatus of claim 2 . Therefore, Claim 21 is rejected on the same grounds and motivation as Claim 5.

Regarding Claim 22 (New), the features of Claim 22 are essentially the same as Claim 6 with the method of claim 17 being performed by the apparatus of claim 2 . Therefore, Claim 22 is rejected on the same grounds and motivation as Claim 6.

Claims 7 and 14 rejected under 35 U.S.C. 103 as being unpatentable over TAKATANI in view of GUREL, MOUSAVIAN, and U.S. Patent Publication 2021/0213973 to PENA et al. (hereinafter “PENA”).

Regarding Claim 7 (Original),  the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor PENA teaches:
wherein the first and second pose networks are convolutional neural networks (CNNs) (Visual-Inertial Odometry (VIO) systems take advantage of visual and inertial information to provide position and orientation estimations. In some implementations, the visual-inertial data fusion is done by using probabilistic filter approaches such as Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF). The visual-inertial data fusion is performed with an EKF based system, which they used to compare different fusion models using only gyroscope data, or gyroscope and accelerometer data. Other variations of the EKF may be implemented, such as the Multi-state Constraint Kalman Filter (MSCKF), such as systems where several past camera poses are used to detect static features and add a constraint to the state vector, among other examples. [¶ 0121] … VIO approaches based on probabilistic filters for sensor fusion may require a hard and complex calibration process in order to bring camera and inertial measurement unit (IMU) measurements to the same reference coordinate system. [¶ 0123] … In addition to the example features above, an improved computer vision system may additional utilize Convolutional Neural Networks (CNNs) to estimate the relative pose between a pair of point clouds using a sparse hierarchical voxel grid representation (e.g., VOLA). [¶ 0138])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of PENA for advantage where encoded features may be combined in a feature vector with features extracted from a CNN, being this feature vector tracked over time by a second LSTM, which may provide a pose estimation of a robot. (PENA: ¶ 0123)

Regarding Claim 14 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor PENA teaches:
wherein the pose network is a convolutional neural networks (CNNs) (Visual-Inertial Odometry (VIO) systems take advantage of visual and inertial information to provide position and orientation estimations. In some implementations, the visual-inertial data fusion is done by using probabilistic filter approaches such as Extended Kalman Filter (EKF) or Unscented Kalman Filter (UKF). The visual-inertial data fusion is performed with an EKF based system, which they used to compare different fusion models using only gyroscope data, or gyroscope and accelerometer data. Other variations of the EKF may be implemented, such as the Multi-state Constraint Kalman Filter (MSCKF), such as systems where several past camera poses are used to detect static features and add a constraint to the state vector, among other examples. [¶ 0121] … VIO approaches based on probabilistic filters for sensor fusion may require a hard and complex calibration process in order to bring camera and inertial measurement unit (IMU) measurements to the same reference coordinate system. [¶ 0123] … In addition to the example features above, an improved computer vision system may additional utilize Convolutional Neural Networks (CNNs) to estimate the relative pose between a pair of point clouds using a sparse hierarchical voxel grid representation (e.g., VOLA). [¶ 0138])

Motivation to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of PENA given in Claim 7 above.

Claims 8 and 15 rejected under 35 U.S.C. 103 as being unpatentable over TAKATANI in view of GUREL, MOUSAVIAN and U.S. Patent Publication 2019/0258251 to DITTY et al. (hereinafter “DITTY”).

Regarding Claim 8 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 2.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor DITTY teaches:
wherein the 6D estimation network includes a multi-layer perceptron network (FIG. 5 is diagram of example camera types and locations on a vehicle. [¶ 0055] Controller (100) provides autonomous driving outputs in response to an array of sensor inputs including, for example: one or more ultrasonic sensors (66), one or more RADAR sensors (68), one or more Light Detection and Ranging (“LIDAR”) sensors (70), one or more surround cameras (72) (typically such cameras are located at various places on vehicle body (52) to image areas all around the vehicle body), one or more stereo cameras (74) (in preferred embodiments, at least one such stereo camera faces forward to provide depth-perception for object detection and object recognition in the vehicle path), one or more infrared cameras (75) … an inertial sensor or inertial measurement unit (“IMU”) (82) that monitors movement of vehicle body (52) (this sensor can be for example an accelerometer(s) and/or a gyro-sensor(s) and/or a magnetic compass(es)). [¶ 0124] … self-driving vehicle (50) includes a plurality of cameras (72, 73, 74, 75, 76), capturing images around the entire periphery of the vehicle. [¶ 0129 illustrated in Fig. 5). [¶ 0129] … A variety of different IMU sensors may be used … embodiments may include six-axis applications (accelerometers and gyroscopes) and nine-axis applications (accelerometers, gyroscopes, and magnetometers). [¶ 0163] … The neural network can take as its input at least some subset of parameters, such as (1) bounding box dimensions, (2) ground plane estimate obtained e.g. from another subsystem, (3) inertial measurement unit (IMU) output that correlates with the ego-vehicle orientation, (4) distance or 3D location estimates of the object obtained from the neural network and/or other sensors, such as LIDAR or RADAR, among others. [¶ 0225] … One implementation could be a multi-layer perceptron (“MLP”) comprising one or more hidden layers. [¶ 0226])

Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of DITTY for advantage of an end-to-end platform with a flexible architecture that provides a comprehensive functional safety architecture that leverages and makes efficient use of computer vision and/or ADAS techniques for diversity and redundancy, provides a platform for a flexible, reliable driving software stack, along with deep learning tools.(DITTY: ¶ 0050)

Regarding Claim 15 (Original), the combination of TAKATANI, GUREL, and MOUSAVIAN teaches the apparatus of claim 9.
While the combination of TAKATANI, GUREL, and MOUSAVIAN does not explicitly teach, or is not relied on to teach, in the same field of endeavor DITTY teaches:
wherein the 6D estimation network includes a multi-layer perceptron network (FIG. 5 is diagram of example camera types and locations on a vehicle. [¶ 0055] Controller (100) provides autonomous driving outputs in response to an array of sensor inputs including, for example: one or more ultrasonic sensors (66), one or more RADAR sensors (68), one or more Light Detection and Ranging (“LIDAR”) sensors (70), one or more surround cameras (72) (typically such cameras are located at various places on vehicle body (52) to image areas all around the vehicle body), one or more stereo cameras (74) (in preferred embodiments, at least one such stereo camera faces forward to provide depth-perception for object detection and object recognition in the vehicle path), one or more infrared cameras (75) … an inertial sensor or inertial measurement unit (“IMU”) (82) that monitors movement of vehicle body (52) (this sensor can be for example an accelerometer(s) and/or a gyro-sensor(s) and/or a magnetic compass(es)). [¶ 0124] … self-driving vehicle (50) includes a plurality of cameras (72, 73, 74, 75, 76), capturing images around the entire periphery of the vehicle. [¶ 0129 illustrated in Fig. 5). [¶ 0129] … A variety of different IMU sensors may be used … embodiments may include six-axis applications (accelerometers and gyroscopes) and nine-axis applications (accelerometers, gyroscopes, and magnetometers). [¶ 0163] … The neural network can take as its input at least some subset of parameters, such as (1) bounding box dimensions, (2) ground plane estimate obtained e.g. from another subsystem, (3) inertial measurement unit (IMU) output that correlates with the ego-vehicle orientation, (4) distance or 3D location estimates of the object obtained from the neural network and/or other sensors, such as LIDAR or RADAR, among others. [¶ 0225] … One implementation could be a multi-layer perceptron (“MLP”) comprising one or more hidden layers. [¶ 0226])

Motivation to combine the teaching of TAKATANI, GUREL, and MOUSAVIAN with that of DITTY given in Claim 8 above.

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ERNEST G TACSIK whose telephone number is 571-270-1279.  The examiner can normally be reached 9:00 am - 6:00 pm Eastern Time.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, Applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.  
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kathy WANG-HURST can be reached on 571-270-5371.  The fax phone number for the organization where this application or proceeding is assigned is 571- 273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/ERNEST G TACSIK/           Examiner, Art Unit 2644