DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.

The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 8 and 19 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 8 and 19 recite the limitation “extreme.” The limitation is indefinite because the claim does not specify what “extreme” refers to. Using the broadest reasonable interpretation, extreme may correspond to a maximum value. However, the claim recites that “a set” of extreme values are estimated. Therefore, it is unclear whether the extreme corresponds to a maximum possible number or estimating outliers, etc. For the purpose of further examination, the limitation has been interpreted as estimating a set of lateral coordinates, e.g., x- coordinates.

Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.

Claim(s) 1, 7, 9, 10, 12, 18, and 20 is/are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Aalerud et al. (“Automatic Calibration of an Industrial RGB-D Camera Network Using Retroreflective Fiducial Markers,” Sensors 2019, 19, 1561; doi:10.3390/s19071561), hereinafter referred to as Aalerud.
Regarding claim 1, Aalerud teaches a method for determining rotation and clipping parameters for images of unit load devices (ULDs), the method comprising:
capturing a set of image data featuring a ULD (Aalerud pg. 13: “100 images were captured wherein the marker was recognized 21 times”; Aalerud Fig. 1);
locating a fiducial marker proximate to the ULD within the set of image data (Aalerud pg. 13 discussed above; Aalerud Abstract: “we use retroreflective fiducial markers in the RGB-D calibration for improved accuracy and detectability”);
cropping the set of image data, based upon the located fiducial marker, to generate a set of marker point data and a set of floor point data (Aalerud Fig. 1 & pg. 5: “key information is included here for completeness. The previously mapped volume was based on a floor surface”; Aalerud pg. 8: “The data was processed such that all markers are detected. This information is used to crop the depth maps and calculate point clouds such that the resulting point clouds will have the same ROI”; Aalerud Fig. 5: Rectangular and padded mask crops; Aalerud pg. 9: “a padded cropping mask is created to match the average position of the marker in the color image”);
rotating the set of image data based upon the set of marker point data and the set of floor point data (Aalerud Algorithm 3: “calculate transformation from marker origin to each corner”; Aalerud Algorithm 4: “Transformation from sensor to world”; Aalerud Fig. 8 & pg. 14: “Positions are given by the coordinates x, y, and z, while the orientations are shown in Euler angles RotZ, RotY, and RotX”); and
clipping the rotated set of image data based upon the set of marker point data and the set of floor point data (Aalerud Figs. 5, 8 & algorithms 3-4 discussed above; the rotated set of image data is extracted).

Regarding claim 7, Aalerud teaches the method of claim 1, further comprising estimating (i) a set of depth clipping coordinates for the rotated set of image data based upon the set of marker point data, (ii) a set of longitudinal clipping coordinates for the rotated set of image data based upon the set of floor point data, and (iii) a set of lateral clipping coordinates for the rotated set of image data based upon the set of marker point data (Aalerud pg. 14: “Positions are given by the coordinates x, y, z, while the orientations are shown in Euler angles RotZ, RotY, and RotX”).

Regarding claim 9, Aalerud teaches the method of claim 1, wherein the set of image data featuring the ULD comprises (i) a three-dimensional (3D) depth image and (ii) a red-green-blue (RGB) image, and wherein the method further comprises: aligning the RGB image with the 3D depth image (Aalerud Abstract: “robust method for calibrating a scalable RGB-D sensor network”; Aalerud Fig. 5 & pg. 9 discussed above).
 
Regarding claim 10, Aalerud teaches the method of claim 1, wherein the fiducial marker further comprises a plurality of fiducial markers proximate to the ULD (Aalerud Figs. 1, 3, & 9).

Regarding claim 12, Aalerud further teaches a system for determining rotation and clipping parameters for images of unit load devices (ULDs), further comprising a housing, an imaging assembly at least partially within the housing and configured to capture a set of image data featuring the ULD, one or more processors, and a non-transitory computer-readable memory coupled to the imaging assembly and the one or more processors, the memory storing instructions thereon (Aalerud pg. 6: “The sensor network comprised of six nodes in waterproofed cabinets. Each cabinet was equipped with a Kinect V2 RGB-D camera based on an active infrared (IR) sensor using the time-of-flight principle … The PC used in this paper was equipped with a 3.6 GHz Intel Corei7-7820x central processing unit (CPU), 32 GiB system memory and an NVIDA GeForce GTX 1080 Ti graphics processing unit (GPU), which in turn has 3584 CUDA cores and 11 GiB memory”) that, when executed by the one or more processors, cause the one or more processors to perform the method described in claim 1. Therefore claim 12 is rejected using the same rationale as applied to claim 1 discussed above.

Claim 18 is rejected using the same rationale as applied to claim 7 discussed above.

Claim 20 is rejected using the same rationale as applied to claim 9 discussed above. Aalerud further teaches that the imaging apparatus comprises a time-of-flight (ToF) camera and a red-green-blue (RGB) camera (Aalerud pg. 6 discussed above).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim(s) 2-6, 8, 11, 13-17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Aalerud et al. (Sensors 2019, 19, 1561; doi:10.3390/s19071561), in view of Pugh et al. (US 2021/0142497 A1), hereinafter referred to as Aalerud and Pugh, respectively.
Regarding claims 2 and 13, Aalerud teaches the method and system of claims 1 and 12, wherein the set of image data featuring the ULD comprises (i) a three-dimensional (3D) depth image that includes 3D point data and (ii) an RGB image that includes two-dimensional (2D) point data that is depth-aligned with the 3D depth image (Aalerud Abstract: “calibrating a scalable RGB-D sensor network”; Aalerud pg. 7: “the color image was mapped to align with the depth map”; Aalerud pg. 9: “a padded cropping mask is created to match the average position of the marker in the color image”).
However, does not appear to explicitly teach that the 2D image is grayscale.
Pertaining to the same field of endeavor, Pugh teaches that the 2D image is grayscale (Pugh ¶0185: “the color of replacement pixels represents a ‘ghost version’ of the original pixels, by modifying the original replacement color …The ghosting color can be any suitable color that identifies a pixel as being associated with an object … the ghosting color can be a grey color, a black color, a color with less intensity as the original color, a lighter color, a darker color, a color with less contrast ...”).
Aalerud and Pugh are considered to be analogous art because they are directed to image processing. It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the RGB-D camera system and method  (as taught by Aalerud) to use greyscale pixel values (as taught by Pugh) because the grayscale can be used to analyze only the intensity values (Pugh ¶0104 & ¶0185).

Regarding claims 3 and 14, Aalerud, in view of Pugh, teaches the method and system of claims 2 and 13, wherein locating the fiducial marker within the set of image data further comprises locating the fiducial marker within the 2D point data, and the method further comprises:
projecting coordinates of the 2D point data corresponding to the fiducial marker onto the 3D point data (Aalerud pg. 7: “As the color image was mapped to align with the depth map”; Aalerud Fig. 4, Algorithm 3 & pg. 10: “The squared Euclidean distance between a depth based corner point, PD, and a projected corner point, PT is calculated in findBestPose”); and
cropping the 3D point data to generate the set of marker point data and the set of floor point data (Aalerud Fig. 5 & pg. 11: “The point clouds were, at this point, accurately cropped so that all points had been captured from a retroreflective surface”).

Regarding claims 4 and 15, Aalerud, in view of Pugh, teaches the method and system of claims 3 and 14, wherein projecting coordinates of the 2D point data corresponding to the fiducial marker onto the 3D point data further comprises:
locating, using a first set of edge values corresponding to the set of marker point data, a second set of edge values corresponding to the set of floor point data (Pugh ¶0021: “determining the floor plane(s) (e.g., using a cascade of 3D depthmap(s), surface normals, gravity, AR-detected planes, and semantic segmentation, etc.); determining edges (e.g., using image gradients or frequencies, neural networks trained to identify edges in the image”; Pugh ¶0092: “using metric scale depth estimates from stereo or multi-lens cameras to improve geometric scale; using known object detection to improve geometric scale; using fiducial markers to improve geometric scale”).

Regarding claims 5 and 16, Aalerud teaches the method and system of claims 1 and 12, wherein the set of image data featuring the ULD comprises at least a three-dimensional (3D) depth image that includes 3D point data, and the method further comprises:
fitting a first plane to the set of marker data points and a second plane to the set of floor data points (Pugh ¶0021: “determining the floor plane(s) (e.g., using a cascade of 3D depthmap(s), surface normals, gravity, AR-detected planes, and semantic segmentation, etc.); determining edges (e.g., using image gradients or frequencies, neural networks trained to identify edges in the image”; Pugh ¶0116: “A voting scheme can be applied to refine the floor-labels as follows: using MVS, compute, for each point p within a search window, the distance to the detected floor plane and/or the normals deviation using the floor's estimated normal”); 
calculating a pitch angle of the 3D point data relative to the camera based upon the set of floor point data; and calculating a yaw angle of the 3D point data relative to the camera based upon the set of marker point data (Pugh ¶0063: “Rectifying the images (S320) can include rotational rectification. Rotational rectification can function to correct camera orientation (e.g. pitch, yaw, roll, etc.) for a given image to improve appearance or reduce perspective distortion”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the RGB-D camera system and method  (as taught by Aalerud) to fit a plane and calculate pitch/yaw (as taught by Pugh) because the combination refines the detection results (Pugh ¶0116).

Regarding claims 6 and 17, Aalerud, in view of Pugh, teaches the method and system of claims 5 and 16, wherein rotating the set of image data base upon the set of marker point data and the set of floor point data further comprises:
rotating, based upon the set of floor point data, the set of image data on a horizontal axis by the pitch angle; and rotating, based upon the set of marker point data, the set of imager data on a vertical axis by the yaw angle (Pugh ¶0063 discussed above; Pugh ¶0065: “rectifying the image includes: adjusting the pitch angle of camera to make vertical lines (which appear to slant in 2D due to converging perspective) closer to parallel (e.g., in the image and/or in the 3D model). In a second example, rectifying the image includes adjusting the roll angle of the camera to make the scene horizon line (or other arbitrary horizontal line) level. In a third example, rectifying the image includes adjusting angles or cropping to optimize field of view. In a fourth example, rectifying the image includes moving the horizontal & vertical components of the principal point of the image”).

Regarding claims 8 and 19, Aalerud teaches the method and system, of claims 7 and 18, wherein estimating the set of depth clipping coordinates further comprises calculating a statistical depth value of the set of marker point data of the fiducial marker within the ULD, estimating the set of longitudinal clipping coordinates further comprises calculating a statistical height value of the set of floor point data, and estimating the set of lateral clipping coordinates further comprises calculating a first set of extreme lateral coordinates corresponding to the ULD based upon a second set of extreme lateral coordinates corresponding to the set of marker point data (Aalerud pg. 14 discussed above; further see Aalerud 4.4 on ICP refinement where the sum of Euclidean distances are minimized to further refine the coordinates).
However, Aalerud does not appear to explicitly teach adjusting by a depth displacement.
Pertaining to the same field of endeavor, Pugh teaches adjusting by depth displacement (Pugh ¶0075: “neural network based contour detection algorithms using disparity maps and/or depthmaps to identify regions likely to have sudden change in depth (i.e., depth discontinuity), optionally refining the maps/depth edges using RGB image information”).

Regarding claim 11, Aalerud teaches the method of claim 1, but does not appear to explicitly teach training a machine learning model.
Pertaining to the same field of endeavor, Pugh teaches training a machine learning model using (i) a plurality of sets of image data, each set of image data featuring a respective ULD, (ii) a plurality of sets of marker point data, each set of marker point data corresponding to a respective set of image data, (iii) a plurality of sets of floor point data, each set of floor point data corresponding to a respective set of image data, and (iv) a plurality of sets of rotated and clipped image data; and applying the machine learning model to the set of image data featuring the ULD to locate the fiducial marker within the set of image data, crop the set of image data, rotate the set of image data, and clip the rotated set of image data (Pugh ¶0021: “the method includes one or more of … determining a depth map (e.g., depth estimates for a set of image pixels; etc.) for the image (e.g., by using neural networks based on the image … determining edges (e.g., using image gradients or frequencies, neural networks trained to identify edges in the image”; Pugh ¶0048: “Two-dimensional features that can be extracted (at S210) can include pixels, patches, descriptors, keypoints, edgels, edges, line segments, blobs, pyramid features, contours, joint lines, optical flow fields, gradients (e.g., color gradients), learned features, bitplanes, and additionally or alternatively any other suitable feature … can be extracted using one or more: feature detectors (e.g., edge detectors, keypoint detectors, line detectors, convolutional feature detectors, etc.), feature matchers (e.g., descriptor search, template matching, optical flow, direct methods, etc.), neural networks (e.g., convolutional neural networks (CNN), deep neural networks (DNN), recurrent neural networks, generative neural networks, etc.)”; Pugh ¶0103: “S450 preferably identifies horizontal planes (e.g., floors), but can additionally or alternatively identify vertical planes (e.g., walls) and/or any other suitable plane. …. The planes can be determined using: trained machine learning models”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the RGB-D camera system and method  (as taught by Aalerud) to use a trained machine learning model (as taught by Pugh) because the combination is automated and produces accurate results (Pugh ¶0021).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SOO J SHIN whose telephone number is (571)272-9753. The examiner can normally be reached M-F; 10-6.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Matthew Bella can be reached on (571)272-7778. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Soo Shin/Primary Examiner, Art Unit 2667