DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This Office Action is in response to the claim amendments filed on March 22, 2022.
Claims 14, 19, and 20 are cancelled.
Claims 1-8 and 12-13 have been amended.
Claims 1-13, 15-18, and 21-22 are pending.
Claims 1-13, 15-18, and 21-22 are currently rejected.

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on March 22, 2022 has been entered.

Response to Arguments
Regarding 35 U.S.C. § 112(d), the claims have been amended to overcome the rejection. Accordingly, the 35 U.S.C. § 112(d) rejection has been withdrawn.
	Regarding 35 U.S.C. § 103, the claims have been amended to overcome the previous rejection. Accordingly, the 35 U.S.C. § 103 rejection has been withdrawn. However, upon further search and consideration, a new ground of rejection is made in view of Smith et al. (U.S. Patent Application Publication No. 2019/0299732) and Julesgaard et al. (U.S. Patent Application Publication No. 2019/0129429).

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-3 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al. (U.S. Patent Application Publication No. 2019/0299732 and hereinafter, “Smith”) in view of Jones et al. (U.S. Patent Application Publication No. 2019/0213438 and hereinafter, “Jones”).

Regarding claim 1, Smith teaches an evaluation device for locating keypoints of a docking station in images of the docking station, comprising:
a first input interface for receiving actual training data, wherein the actual training data comprise the images of the docking station taken from a perspective of a vehicle to which an image sensor that captures the images of the docking station is attached, 
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the cameras may be the first input interface.
Smith [0241] discloses “Based on feature information identified via step 5324 or step 5326, or (optionally) both, the procedure 5300 then ranks locations on the trailer face from highest to lowest probability of glad hand/panel presence (step 5330). This ranking can be based on a variety of factors including the prevalence of glad hand/panel candidate features, a strong pattern match of specific colors or shapes, or other metrics. Trained pattern recognition software can be employed according to skill in the art. In step 5332, the location with the highest rank is selected as the target for gross position movement of the manipulator and the end effector carrying the truck connection.”
wherein the keypoints of the docking station are marked in the images, and wherein the keypoints comprise physical features of the docking station itself, wherein the docking station is at least one of a trailer, a container, or a swap body that is disconnected from the vehicle and to which the vehicle connects; and
Smith [0022] discloses “The processor identifies point groups/clouds and compares the point groups/clouds to expected shapes and locations of the kingpin and landing gear legs. The processor can be arranged to iteratively image with the LIDAR device and locate groups of points that represent the expected locations.”
The Examiner notes that “keypoint” has been interpreted to be “distinctive points on a trailer or a docking station,” as defined in [0011] of the instant specification.
a second input interface for receiving target training data, wherein the target training data comprise target position data of the respective keypoints comprising at least known two-dimensional locations in the images of the respective keypoints
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the LiDAR may be a second input interface.
Smith [0114] discloses “…determining the position of the kingpin within the vehicle/navigation coordinate space…”
Smith [0186] discloses “These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
an output interface for outputting the actual position data.
Smith [0281] discloses “…the procedure 7700 outputs detection that has the highest priority for use to guide the backing operation of the truck onto the trailer via the navigation coordinate space.”
Smith in combination with Jones teaches: 
and adjust weighting factors for connections between neurons in the artificial neural network through backward propagation of a deviation between the actual position data and the target position data, to minimize the deviation, in order to learn the target position data of the keypoints;
Jones [0059] discloses “A machine learning module enables the mobile robot to recognize objects in the environment, as well as recognize the position of the robot in the environment, based on the information provided by the one or more sensors.”
Jones [0190] discloses “The error is back-propagated, the derivative of the error with respect to each weight in the network is calculated, and the network is updated.”
wherein the evaluation device is configured to: forward propagate an artificial neural network with the actual training data and receive actual position data of the respective keypoints determined with the artificial neural network in this forward propagation;
Jones [0171] discloses “If the robot 102 knows the distances of an object to the markers, the robot 102 can determine the location of the object relative to the three or more markers using 3D triangulation.”
Jones [0190] discloses “The neural network 124 can be trained as follows. Starting at the input layer, the patterns of the training data are forward propagated through the network to generate an output. Based on the network's output, an error is calculated using a cost function, in which the training process attempts to minimize the error...After the neural network 124 has been trained, a new image (e.g., 1602) including one or more objects…is provided as input to the network and forward propagated to calculate the network output, and a threshold function is applied to obtain the predicted class labels…The output image 1604 includes the object(s) bound by bounding box(es) 1606 having the predicted label(s).”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer vision algorithm disclosed in Smith to incorporate backward propagation, as taught in Jones, such that “the neural network is trained with more images of the objects in the environment and the accuracy of recognizing the objects increases over time” (Jones [0060]).
	Furthermore, one of ordinary skill in the art would have recognized that applying the known technique of Jones to the disclosure of Smith would have yielded predictable results and resulted in an improved system. It would have been recognized that applying the technique of Jones to the teaching of Smith would have yielded predictable results because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such forward and backward propagation into the neural network as disclosed by Smith. Further, applying the forward and backward propagation to Smith with a neural network, would have been recognized by one of ordinary skill in the art as resulting in an improved system that would allow more efficient evaluation.

Regarding claim 2, Smith teaches a method for locating keypoints of a docking station in images of the docking station, comprising:
receiving actual training data and position data of the keypoints, wherein the actual training data comprises images of the docking station taken from a perspective of a vehicle to which an image sensor that captures the images of the docking station is attached, and wherein the keypoints comprise physical features of the docking station itself, wherein the docking station is at least one of a trailer, a container, or a swap body that is disconnected from the vehicle and to which the vehicle connects
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0022] discloses “The processor identifies point groups/clouds and compares the point groups/clouds to expected shapes and locations of the kingpin and landing gear legs. The processor can be arranged to iteratively image with the LIDAR device and locate groups of points that represent the expected locations.”
The Examiner notes that “keypoint” has been interpreted to be “distinctive points on a trailer or a docking station,” as defined in [0011] of the instant specification.
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
receiving target training data, wherein the target training data comprise target position data of the respective keypoints in the images comprising at least known two-dimensional locations in the images of the respective keypoints
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
Smith [0114] discloses “…determining the position of the kingpin within the vehicle/navigation coordinate space…”
Smith [0241] discloses “…the location with the highest rank is selected as the target for gross position movement of the manipulator and the end effector carrying the truck connection.”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith does not expressly teach:
forward propagation of an artificial neural network with the actual training data, and determining actual position data of the respective keypoints with the artificial neural network, wherein the actual position data comprises at least two-dimensional position data of the respective keypoints within the images;
and backward propagation of a deviation between the actual position data and the target position data in order to adjust weighting factors for connections between neurons of the artificial neural network such that the deviation is minimized, in order to learn the target position data of the keypoints.
However, Jones teaches:
forward propagation of an artificial neural network with the actual training data, and determining actual position data of the respective keypoints with the artificial neural network, wherein the actual position data comprises at least two-dimensional position data of the respective keypoints within the images
Jones [0171] discloses “If the robot 102 knows the distances of an object to the markers, the robot 102 can determine the location of the object relative to the three or more markers using 3D triangulation.”
Jones [0190] discloses “The neural network 124 can be trained as follows. Starting at the input layer, the patterns of the training data are forward propagated through the network to generate an output. Based on the network's output, an error is calculated using a cost function, in which the training process attempts to minimize the error...After the neural network 124 has been trained, a new image (e.g., 1602) including one or more objects…is provided as input to the network and forward propagated to calculate the network output, and a threshold function is applied to obtain the predicted class labels…The output image 1604 includes the object(s) bound by bounding box(es) 1606 having the predicted label(s).”
and backward propagation of a deviation between the actual position data and the target position data in order to adjust weighting factors for connections between neurons of the artificial neural network such that the deviation is minimized, in order to learn the target position data of the keypoints.
Jones [0059] discloses “A machine learning module enables the mobile robot to recognize objects in the environment, as well as recognize the position of the robot in the environment, based on the information provided by the one or more sensors.”
Jones [0190] discloses “The error is back-propagated, the derivative of the error with respect to each weight in the network is calculated, and the network is updated.”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer vision algorithm disclosed in Smith to incorporate forward and backward propagation, as taught in Jones, such that “the neural network is trained with more images of the objects in the environment and the accuracy of recognizing the objects increases over time” (Jones [0060]).

Regarding claim 3, Smith in combination with Jones teaches the method according to claim 2, Smith further comprising:
receiving, by a first input interface of an evaluation device, the actual training data and position data of the keypoints;
Smith [0022] discloses “The processor identifies point groups/clouds and compares the point groups/clouds to expected shapes and locations of the kingpin and landing gear legs. The processor can be arranged to iteratively image with the LIDAR device and locate groups of points that represent the expected locations.”
The Examiner notes that “keypoint” has been interpreted to be “distinctive points on a trailer or a docking station,” as defined in [0011] of the instant specification.
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith [0114] discloses “…determining the position of the kingpin within the vehicle/navigation coordinate space…”
receiving, by a second input interface of the evaluation device, the target training data;
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the LiDAR may be a second input interface.
Smith [0114] discloses “…determining the position of the kingpin within the vehicle/navigation coordinate space…”
Smith does not expressly teach:
forward propagation, by the evaluation device, of the artificial neural network with the actual training data, and determining, by the evaluation device, the actual position data of the respective keypoints with the artificial neural network; and
backward propagation by the evaluation device, of the deviation between the actual position data and the target position data in order to adjust the weighting factors
However, Jones teaches:
forward propagation, by the evaluation device, of the artificial neural network with the actual training data, and determining, by the evaluation device, the actual position data of the respective keypoints with the artificial neural network; and
Jones [0171] discloses “If the robot 102 knows the distances of an object to the markers, the robot 102 can determine the location of the object relative to the three or more markers using 3D triangulation.”
Jones [0190] discloses “The neural network 124 can be trained as follows. Starting at the input layer, the patterns of the training data are forward propagated through the network to generate an output. Based on the network's output, an error is calculated using a cost function, in which the training process attempts to minimize the error...After the neural network 124 has been trained, a new image (e.g., 1602) including one or more objects…is provided as input to the network and forward propagated to calculate the network output, and a threshold function is applied to obtain the predicted class labels…The output image 1604 includes the object(s) bound by bounding box(es) 1606 having the predicted label(s).”
backward propagation by the evaluation device, of the deviation between the actual position data and the target position data in order to adjust the weighting factors
Jones [0059] discloses “A machine learning module enables the mobile robot to recognize objects in the environment, as well as recognize the position of the robot in the environment, based on the information provided by the one or more sensors.”
Jones [0190] discloses “The error is back-propagated, the derivative of the error with respect to each weight in the network is calculated, and the network is updated.”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the computer vision algorithm disclosed in Smith to incorporate forward and backward propagation, as taught in Jones, such that “the neural network is trained with more images of the objects in the environment and the accuracy of recognizing the objects increases over time” (Jones [0060]).

Claims 4, 6-13, and 15-18 are rejected under 35 U.S.C. 103 as being unpatentable over Smith et al. in view of Julesgaard et al. (U.S. Patent Application Publication No. 2019/0129429 and hereinafter, “Julesgaard”).

Regarding claim 4, Smith teaches an evaluation device for automated docking of a vehicle at a docking station, comprising:
an input interface for receiving at least one image of the docking station recorded with an imaging sensor that can be placed on the vehicle, wherein the docking station is at least one of a trailer, a container, or a swap body that is disconnected from the vehicle and to which the vehicle is configured to connect
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
and an output interface, for outputting a signal for a vehicle steering system based on the determined position of the docking station in relation to the vehicle, in order to automatically drive the vehicle to dock it at the docking station to enable the vehicle to move the docking station under power of the vehicle.
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0003] discloses “The cab provides power (through (e.g.) a generator, pneumatic pressure source, etc.) used to operate both itself and the attached trailer.”
Smith [0009] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0012] discloses “A processor facilitates autonomous movement of the AV yard truck, substantially free of human user control inputs to onboard controls of the truck, and connection to and disconnection from trailers in the yard.”
Smith does not expressly teach:
wherein the evaluation device is configured to run an artificial neural network that is trained to determine at least two- dimensional image coordinates of keypoints of the docking station in the at least one image of the docking station based on the image, wherein the keypoints comprise physical features of the docking station itself
determine at least one of a position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints
and determine at least one of a position or orientation of the docking station in relation to the vehicle based on the determined at least one of the position or orientation of the imaging sensor and a known location of the imaging sensor on the vehicle
Julesgaard teaches:
wherein the evaluation device is configured to run an artificial neural network that is trained to determine at least two- dimensional image coordinates of keypoints of the docking station in the at least one image of the docking station based on the image, wherein the keypoints comprise physical features of the docking station itself
Julesgaard [0021] discloses “…one or more sensors (e.g., cameras, lidar sensors, and/or radar sensors, etc.) ...can be configured to capture sensor data (e.g., image data, lidar sweep data, radar data, etc.) to provide for determining one or more angles and/or one or more distances between the tractor portion and the trailer portion of the autonomous truck.”
Julesgaard [0032] discloses “…one or more sensors can be positioned on or near the front of the autonomous truck (e.g., the tractor) at positions that provide good vantage points of the trailer and can provide sensor data to allow for determining one or more angles and/or distances between the tractor and trailer.”
Juelsgaard [0038] discloses “Neural networks can include recurrent neural networks (e.g., long, short-term memory recurrent neural networks), feed-forward neural networks, convolutional neural networks, and/or other forms of neural networks. For instance, supervised training techniques can be performed to train a model, for example, using labeled training data (e.g., ground truth data) to provide for detecting and identifying the position and movement of the autonomous vehicle by receiving, as input, sensor data associated with the portions of an autonomous vehicle, and generating, as output, estimates for one or more angles and one or more distances between the portions of the autonomous vehicle (e.g., between a tractor and a trailer)
determine at least one of a position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
and determine at least one of a position or orientation of the docking station in relation to the vehicle based on the determined at least one of the position or orientation of the imaging sensor and a known location of the imaging sensor on the vehicle
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the trained systems of Smith to incorporate an artificial neural network, as taught in Julesgaard, to achieve more accurate and timely motion planning to respond to changes in vehicle dynamics while avoiding latency issues (Julesgaard [0043]).

Regarding claim 6, Smith in combination with Julesgaard teaches the vehicle for automated docking at a docking station, Smith further comprising:
a camera with the imaging sensor, which is located on the vehicle, for obtaining images of the docking station;
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
and a vehicle steering system, for driving the vehicle automatically in order to dock it at the docking station, based on the signal.
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
The Examiner notes that navigation encompasses steering.
Smith [0012] discloses “A processor facilitates autonomous movement of the AV yard truck, substantially free of human user control inputs to onboard controls of the truck, and connection to and disconnection from trailers in the yard.”
Smith does not expressly teach:
the evaluation device according to claim 4, for outputting a signal for a vehicle control based on at least one of a determined position or orientation of the docking station in relation to the vehicle
Julesgaard teaches:
the evaluation device according to claim 4, for outputting a signal for a vehicle control based on at least one of a determined position or orientation of the docking station in relation to the vehicle
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the generalized location of the trailer disclosed in Smith to explicitly teach determining a position or orientation of the docking station relative to the vehicle, as taught in Julesgaard, to provide an improved field of view behind the autonomous vehicle and reduce blind spots (Julesgaard [0042]).

Regarding claim 7, Smith teaches a method for automated docking of a vehicle at a docking station, comprising the steps:
obtaining at least one image of the docking station recorded with an imaging sensor that can be placed on the vehicle, wherein the docking station is at least one of a trailer, a container, or a swap body that is disconnected from the vehicle and to which the vehicle is configured to connect
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the cameras may be the first input interface.
and outputting a signal for a vehicle steering system based on the determined at least one of the position or orientation of the docking station in relation to the vehicle in order to automatically drive the vehicle to dock it at the docking station to enable the vehicle to move the docking station under power of the vehicle.
Smith [0002] discloses “The cab provides power (through (e.g.) a generator, pneumatic pressure source, etc.) used to operate both itself and the attached trailer.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
The Examiner notes that navigation encompasses steering.
Smith [0012] discloses “A processor facilitates autonomous movement of the AV yard truck, substantially free of human user control inputs to onboard controls of the truck, and connection to and disconnection from trailers in the yard.”
Smith does not expressly teach:
running an artificial neural network that is trained to determine at least two- dimensional image coordinates of keypoints of the docking station in the at least one image of the docking station based on the image, wherein the keypoints comprise physical features of the docking station itself
determining at least one of a position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints
determining at least one of a position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle
Julesgaard teaches:
running an artificial neural network that is trained to determine at least two- dimensional image coordinates of keypoints of the docking station in the at least one image of the docking station based on the image, wherein the keypoints comprise physical features of the docking station itself
Julesgaard [0021] discloses “…one or more sensors (e.g., cameras, lidar sensors, and/or radar sensors, etc.) ...can be configured to capture sensor data (e.g., image data, lidar sweep data, radar data, etc.) to provide for determining one or more angles and/or one or more distances between the tractor portion and the trailer portion of the autonomous truck.”
Julesgaard [0032] discloses “…one or more sensors can be positioned on or near the front of the autonomous truck (e.g., the tractor) at positions that provide good vantage points of the trailer and can provide sensor data to allow for determining one or more angles and/or distances between the tractor and trailer.”
Juelsgaard [0038] discloses “Neural networks can include recurrent neural networks (e.g., long, short-term memory recurrent neural networks), feed-forward neural networks, convolutional neural networks, and/or other forms of neural networks. For instance, supervised training techniques can be performed to train a model, for example, using labeled training data (e.g., ground truth data) to provide for detecting and identifying the position and movement of the autonomous vehicle by receiving, as input, sensor data associated with the portions of an autonomous vehicle, and generating, as output, estimates for one or more angles and one or more distances between the portions of the autonomous vehicle (e.g., between a tractor and a trailer)
determining at least one of a position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
determining at least one of a position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the trained systems of Smith to incorporate an artificial neural network, as taught in Julesgaard, to achieve more accurate and timely motion planning to respond to changes in vehicle dynamics while avoiding latency issues (Julesgaard [0043]).

Regarding claim 8, Smith in combination with Julesgaard teaches the method according to claim 7, Smith further comprising:
automatically driving, by the vehicle steering system, the vehicle in order to dock it at the docking station, based on the signal.
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0012] discloses “A processor facilitates autonomous movement of the AV yard truck, substantially free of human user control inputs to onboard controls of the truck, and connection to and disconnection from trailers in the yard.”

Regarding claim 9, Smith in combination with Julesgaard teaches the method according to claim 7, wherein Smith further teaches:
a known model of the docking station is used in determining the at least one of the position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints, wherein the model indicates the relative positions of the keypoints to one another.
Smith [0116] discloses “The guard/attendant enters the trailer information (trailer number or QR (ID) code scan-imbedded information already in the system, which would typically include: trailer make/model/year/service connection location, etc.) into the facility software system…”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…”

Regarding claim 10, Smith in combination with Julesgaard teaches the method according to claim 9, wherein Smith further teaches:
intrinsic parameters of the imaging sensor are used in the use of the known model.
Smith [0116] discloses “The guard/attendant enters the trailer information (trailer number or QR (ID) code scan-imbedded information already in the system, which would typically include: trailer make/model/year/service connection location, etc.) into the facility software system…”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”

Regarding claim 11, Smith in combination with Julesgaard teaches the method according to claim 7, wherein Smith further teaches:
coordinate transformation from the imaging sensor system to the vehicle system is carried out in determining the at least one of the position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle.
Smith [0213] discloses “The extracted image pixel coordinates can be related to the planar physical dimensions of the tag using a homography (transformation) in accordance with known techniques. This transformation provides the rotation and translation of the tag relative to the sensor's coordinate space. The known transformation between the sensor and delivery coordinate frame and the known transformation between the tag and the glad hand coordinate frame enables an estimate of the glad hand pose for fine positioning.”
Smith [0215] discloses “Visual servoing can be used to achieve proper positioning for a mating operation between the end-effector-carried glad hand/connector and the trailer glad hand.”

Regarding claim 12, Smith in combination with Julesgaard teaches the method according to claim 7, Julesgaard further comprising:
running the artificial neural network by an evaluation device;
Juelsgaard [0038] discloses “Neural networks can include recurrent neural networks (e.g., long, short-term memory recurrent neural networks), feed-forward neural networks, convolutional neural networks, and/or other forms of neural networks. For instance, supervised training techniques can be performed to train a model, for example, using labeled training data (e.g., ground truth data) to provide for detecting and identifying the position and movement of the autonomous vehicle by receiving, as input, sensor data associated with the portions of an autonomous vehicle, and generating, as output, estimates for one or more angles and one or more distances between the portions of the autonomous vehicle (e.g., between a tractor and a trailer)
determining, by the evaluation device, the at least one of the position or orientation of the imaging sensor in relation to the keypoints; and
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
determining, by the evaluation device, the at least one of the position or orientation of the docking station in relation to the vehicle
Julesgaard [0033] discloses “The one or more sensors can be configured for detecting edges of the trailer and/or tractor, one or more specific targets located on the trailer and/or tractor, one or more surfaces of the trailer and/or tractor, and/or like methods for providing and/or analyzing frames of reference, and enable determining one or more angles and/or distances between the tractor and trailer…”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the trained systems of Smith to incorporate an artificial neural network, as taught in Julesgaard, to achieve more accurate and timely motion planning to respond to changes in vehicle dynamics while avoiding latency issues (Julesgaard [0043]).

Regarding claim 13, Smith in combination with Julesgaard teaches a non-transitory computer readable medium having stored thereon a computer program for docking a vehicle at a docking station, wherein Smith further teaches:
the computer program comprises software code segments with which the steps of the method according to claim 7 are executed when the computer program runs on a computer.
Smith [0284] discloses “Likewise, it is expressly contemplated that any function, process and/or processor herein can be implemented using electronic hardware, software consisting of a non-transitory computer-readable medium of program instructions, or a combination of hardware and software.”

Regarding claim 15, Smith in combination with Julesgaard teaches the method according to claim 8, wherein Smith further teaches:
a known model of the docking station is used in determining the at least one of the position or orientation of the imaging sensor in relation to the keypoints based on a known geometry of the keypoints, wherein the model indicates the relative positions of the keypoints to one another.
Smith [0116] discloses “The guard/attendant enters the trailer information (trailer number or QR (ID) code scan-imbedded information already in the system, which would typically include: trailer make/model/year/service connection location, etc.) into the facility software system…”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…”

Regarding claim 16, Smith in combination with Julesgaard teaches the method according to claim 8, wherein Smith further teaches:
coordinate transformation from the imaging sensor system to the vehicle system is carried out in determining the at least one of the position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle.
Smith [0213] discloses “The extracted image pixel coordinates can be related to the planar physical dimensions of the tag using a homography (transformation) in accordance with known techniques. This transformation provides the rotation and translation of the tag relative to the sensor's coordinate space. The known transformation between the sensor and delivery coordinate frame and the known transformation between the tag and the glad hand coordinate frame enables an estimate of the glad hand pose for fine positioning.”

Regarding claim 17, Smith in combination with Julesgaard teaches the method according to claim 9, wherein Smith further teaches:
coordinate transformation from the imaging sensor system to the vehicle system is carried out in determining the at least one of the position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle.
Smith [0213] discloses “The extracted image pixel coordinates can be related to the planar physical dimensions of the tag using a homography (transformation) in accordance with known techniques. This transformation provides the rotation and translation of the tag relative to the sensor's coordinate space. The known transformation between the sensor and delivery coordinate frame and the known transformation between the tag and the glad hand coordinate frame enables an estimate of the glad hand pose for fine positioning.”

Regarding claim 18, Smith in combination with Julesgaard teaches the method according to claim 10, wherein Smith further teaches:
coordinate transformation from the imaging sensor system to the vehicle system is carried out in determining the at least one of the position or orientation of the docking station in relation to the vehicle based on the determined position of the imaging sensor and a known location of the imaging sensor on the vehicle.
Smith [0213] discloses “The extracted image pixel coordinates can be related to the planar physical dimensions of the tag using a homography (transformation) in accordance with known techniques. This transformation provides the rotation and translation of the tag relative to the sensor's coordinate space. The known transformation between the sensor and delivery coordinate frame and the known transformation between the tag and the glad hand coordinate frame enables an estimate of the glad hand pose for fine positioning.”

Regarding claim 21, Smith in combination with Julesgaard teaches the evaluation device according claim 1, wherein Smith further teaches:
the docking station comprises a trailer.
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0012] discloses “A processor facilitates autonomous movement of the AV yard truck, substantially free of human user control inputs to onboard controls of the truck, and connection to and disconnection from trailers in the yard.”

Regarding claim 22, Smith teaches the evaluation device according claim 21, wherein Smith further teaches:
the keypoints comprise at least one corner of the trailer.
Smith [0155] discloses “In an embodiment, the location can be computed in relation to a fixed point, such as the code sticker itself, kingpin, trailer body edge and/or corner, etc.”

Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Smith et al. in view of Julesgaard et al., further in view of Jones et al.

Regarding claim 5, Smith in combination with Julesgaard teaches the evaluation device according to claim 4, Smith further comprising:
a first input interface for receiving actual training data, wherein the actual training data comprise training images of the docking station taken from a perspective of the vehicle to which the imaging sensor is attached, wherein the keypoints of the docking station are marked in the training images; and
Smith, Abstract, discloses “A plurality of sensors are interconnected with the processor that sense terrain/objects and assist in automatically connecting/disconnecting trailers.”
Smith [0008] discloses “Identification of trailers in a yard and navigation with respect to such trailers is automated, and safety mechanisms and operations when docking and undocking a trailer are automated.”
Smith [0186] discloses “In another exemplary embodiment, the sensor assembly 3210 includes a dense 3D sensing, which is used to detect the front face 3110 of the trailer 3100 using the known/trained 3D geometric signature of the trailer face…These 2D and/or 3D sensing modalities each return the generalized location and boundaries of the trailer front face, and potentially its range from a reference point on the truck.”
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the cameras may be the first input interface.
Smith [0241] discloses “Based on feature information identified via step 5324 or step 5326, or (optionally) both, the procedure 5300 then ranks locations on the trailer face from highest to lowest probability of glad hand/panel presence (step 5330). This ranking can be based on a variety of factors including the prevalence of glad hand/panel candidate features, a strong pattern match of specific colors or shapes, or other metrics. Trained pattern recognition software can be employed according to skill in the art. In step 5332, the location with the highest rank is selected as the target for gross position movement of the manipulator and the end effector carrying the truck connection.”
a second input interface for receiving target training data, wherein the target training data comprise target position data of the respective keypoints comprising at least known two-dimensional locations in the training images of the respective keypoints,
Smith [0119] discloses “As the yard truck backs down to the trailer, it uses one or multiple mounted (e.g. a standard or custom, 2D grayscale or color-pixel, image sensor-based) cameras (and/or other associated (typically 3D/range-determining) sensors, such as GPS receiver(s), radar, LiDAR, stereo vision, time-of-flight cameras, ultrasonic/laser range finders, etc.) to assist in: (i) confirming the identity of the trailer through reading the trailer number or scanning a QR, bar, or other type of coded identifier; (ii) Aligning the truck's connectors with the corresponding trailer receptacles.”
The Examiner notes that the LiDAR may be a second input interface.
Smith [0114] discloses “…determining the position of the kingpin within the vehicle/navigation coordinate space…”
an output interface for outputting the actual position data.
Smith [0281] discloses “…the procedure 7700 outputs detection that has the highest priority for use to guide the backing operation of the truck onto the trailer via the navigation coordinate space.”
The combination of Smith and Julesgaard does not expressly teach:
wherein the evaluation device is configured to: forward propagate the artificial neural network with the actual training data and receive actual position data of the respective keypoints determined with the artificial neural network in this forward propagation, wherein the actual position data comprises at least two-dimensional position data of the respective keypoints within the images; and
Jones teaches:
wherein the evaluation device is configured to: forward propagate the artificial neural network with the actual training data and receive actual position data of the respective keypoints determined with the artificial neural network in this forward propagation, wherein the actual position data comprises at least two-dimensional position data of the respective keypoints within the images; and
Jones [0171] discloses “If the robot 102 knows the distances of an object to the markers, the robot 102 can determine the location of the object relative to the three or more markers using 3D triangulation.”
Jones [0190] discloses “The neural network 124 can be trained as follows. Starting at the input layer, the patterns of the training data are forward propagated through the network to generate an output. Based on the network's output, an error is calculated using a cost function, in which the training process attempts to minimize the error...After the neural network 124 has been trained, a new image (e.g., 1602) including one or more objects…is provided as input to the network and forward propagated to calculate the network output, and a threshold function is applied to obtain the predicted class labels…The output image 1604 includes the object(s) bound by bounding box(es) 1606 having the predicted label(s).”
adjust weighting factors for connections between neurons in the artificial neural network through backward propagation of a deviation between the actual position data and the target position data, to minimize the deviation, in order to learn the target position data of the keypoints; and 
Jones [0059] discloses “A machine learning module enables the mobile robot to recognize objects in the environment, as well as recognize the position of the robot in the environment, based on the information provided by the one or more sensors.”
Jones [0190] discloses “The error is back-propagated, the derivative of the error with respect to each weight in the network is calculated, and the network is updated.”
It would have been obvious to one having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Smith and Julesgaard to incorporate backward propagation, as taught in Jones, such that “the neural network is trained with more images of the objects in the environment and the accuracy of recognizing the objects increases over time” (Jones [0060]).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Naserian et al. (U.S. Patent Application Publication No. 20180056868) discloses a system and method to determine trailer pose that includes imaging telltales affixed to a trailer to provide trailer image data and determining the trailer pose from the trailer image data.

    PNG
    media_image1.png
    443
    920
    media_image1.png
    Greyscale

Naserian et al. Figure 7

Any inquiry concerning this communication or earlier communications from the examiner should be directed to STEPHANIE T SU whose telephone number is (571)272-5326. The examiner can normally be reached Monday to Friday, 8:30AM - 5:00PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ANISS CHAD can be reached on (571)270-3832. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/S.T.S./Patent Examiner, Art Unit 3662                           

/ANISS CHAD/Supervisory Patent Examiner, Art Unit 3662