Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Objections
Claim 1 is objected to because of the following informalities:  
Claim 1 recites: “learn … by the determination means” wherein “the” is insufficient antecedent basis for this limitation in the claim.  
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 – 8, 12 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xu et al. (“Multi-modal deep feature learning for RGB-D object detection”, Pattern Recognition 72 (2017), pp 300-313), hereinafter referred as Xu, in view of Matei et al. (US Patent Application Publication 2018/0205963), hereinafter referred as Matei.

Regarding claim 1, Xu discloses an image processing method (abstract) comprising: 
page 304, section 3.2, section 3.4 “ground-truth label”, “ground-truth box”), each ground truth area including a detection target in each of a plurality of images obtained by capturing a specific detection target (page 304, section 3.2, section 3.4, “ground-truth label”, “ground-truth box”; page 306, Fig. 4, shows detection target area) by a plurality of different modals (page 301, Fig. 1, page 306, Fig. 4, RGB modal and depth modal), with a ground truth label attached to the detection target (page 304, section 3.2, “ground-truth label”; page 305, section 4.3, “ground-truth annotation”), a degree to which each of a plurality of candidate areas that correspond to respective predetermined positions common to the plurality of images includes a corresponding ground truth area for each of the plurality of images (page 306, Fig. 4, (c) RGB-D correlation, correlating each of a plurality of candidate areas (e.g. door, box, etc. common to the plurality of RGB and depth images); and 
learn (page 304, section 3.4. “Training”), based on a plurality of feature maps extracted from each of the plurality of images (pages 308 – 309, section 4.4.5, Figs. 6 and 7, feature maps), a set of the results of the determination made by the determination means for each of the plurality of images (page 306, Fig. 4, (c) RGB-D correlation, determined correlation above), and the ground truth (page 304, section 3.2, “ground-truth label”; page 305, section 4.3, “ground-truth annotation”), a first parameter used (pages 303 - 305, train a model is to train/learn  a set of parameters / weights) when an amount of disagreement between the detection target included in a first image captured by a first modal and the detection target included in a second image captured by a second modal is predicted (page 302, col. 1, page 308, section 4.4.4, disagreements between RGB-specific and depth-specific object detection results can be rectified)
and store the learned first parameter in a storage means (pages 303 - 312, after model/parameter is trained, the model/parameter is stored for real-time implementation/evaluation).
However, Xu fails to explicitly disclose that the method is implemented by an image processing apparatus, comprising: at least one memory storing instructions, and at least one processor configured to execute the instructions wherein the amount of disagreement is an amount of positional deviation between the position of the detection target included in a first image captured by a first modal and the detection target included in a second image captured by a second modal.  
However, in a similar field of endeavor Matei discloses a method for image processing captured by an RGB camera and depth camera ([0197]). In addition, Matei discloses the method is implemented by an image processing apparatus, comprising: at least one memory storing instructions ([0109]), and at least one processor configured to execute the instructions ([0109]) wherein there is an amount of positional deviation between the position of the detection target included in a first image captured by a first modal (RGB camera) and the detection target included in a second image captured by a second modal (depth camera) (Fig. 3, [0197], parallax). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Xu, and implemented by an image processing apparatus, comprising: at least one memory storing instructions, and at least one processor configured to execute the instructions 

Regarding claim 2 (depends on claim 1), Xu discloses the method further configured to execute the instructions to learn the first parameter using the difference between each of the plurality of ground truth areas in a set of the results of the determination in which the degree is equal to or larger than a predetermined value and a predetermined reference area in the detection target as the amount of disagreement (page 304, section 3.4, page 306, Fig. 4, threshold).
However, Xu fails to explicitly disclose that the method is implemented by an image processing apparatus with the at least one processor wherein the amount of disagreement is an amount of positional deviation between the position of the detection target included in a first image captured by a first modal and the detection target included in a second image captured by a second modal.  
However, in a similar field of endeavor Matei discloses a method for image processing captured by an RGB camera and depth camera ([0197]). In addition, Matei discloses the method is implemented by an image processing apparatus, comprising: at least one memory storing instructions ([0109]), and at least one processor configured to execute the instructions ([0109]) wherein there is an amount of positional deviation Fig. 3, [0197], parallax). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Xu, and implemented by an image processing apparatus, comprising: at least one memory storing instructions, and at least one processor configured to execute the instructions wherein the amount of disagreement is an amount of positional deviation between the position of the detection target included in a first image captured by a first modal and the detection target included in a second image captured by a second modal. The motivation for doing this is that the method of Xu can be implemented by a substantial machine and disagreement between the detection target described by Xu can be further specialized so that a particular aspect can be focused.

Regarding claim 3 (depends on claim 2), Xu discloses the apparatus wherein the at least one processor further configured to execute the instructions to use one of the plurality of ground truth areas or an intermediate position of the plurality of ground truth areas as the reference area (page 304, section 3.2, section 3.4, “ground-truth box”).

Regarding claim 4 (depends on claim 1), Xu discloses the apparatus wherein the at least one processor further configured to execute the instructions to learn, based on the set of the results of the determination and the feature maps, a second parameter used (pages 303 - 305, train a model is to train/learn  a set of parameters / weights) to page 306, Fig. 4, page 311, Fig. 10, a score threshold) and store the learned second parameter in the storage means (pages 303 - 312, after model/parameter is trained, the model/parameter is stored for real-time implementation/evaluation); and learn, based on the set of the results of the determination and the feature maps (pages 308 – 309, section 4.4.5, Figs. 6 and 7, feature maps), a third parameter used to perform regression to make the position and the shape of the candidate area close to a ground truth area used for the determination (page 304, section 3.2, bounding box regression deviation) and store the learned third parameter in the storage means (pages 303 - 312, after model/parameter is trained, the model/parameter is stored for real-time implementation/evaluation).

Regarding claim 5 (depends on claim 1), Xu discloses the apparatus wherein the at least one processor further configured to execute the instructions to learn, based on the set of the results of the determination, a fourth parameter used to extract the plurality of feature maps from each of the plurality of images (pages 308 – 309, section 4.4.5, Figs. 6 and 7, feature maps; pages 303 - 305, train a model is to train/learn  a set of parameters/weights) and store the learned fourth parameter in the storage means (pages 303 - 312, after model/parameter is trained, the model/parameter is stored for real-time implementation/evaluation), wherein learn the first parameter using the plurality of feature maps extracted from each of the plurality of images using the fourth parameter stored in the storage means (pages 308 – 309, section 4.4.5, Figs. 6 and 7, training using feature maps).

Regarding claim 6 (depends on claim 5), Xu discloses the apparatus wherein the at least one processor further configured to execute the instructions to learn a fifth parameter that fuses the plurality of feature maps (page 303, col. 2, correlated feature maps) and is used to identify the candidate areas (pages 308 – 309, section 4.4.5, Figs. 6 and 7, training using correlated feature maps) and store the learned fifth parameter in the storage means (pages 303 - 312, after model/parameter is trained, the model/parameter is stored for real-time implementation/evaluation).

Regarding claim 7 (depends on claim 5), Xu discloses the apparatus wherein the at least one processor further configured to execute the instructions to predict, using a plurality of feature maps extracted using the fourth parameter stored in the storage means from a plurality of input images captured by the plurality of modals and the first parameter stored in the storage means, an amount of positional deviation in the detection target between the input images (page 304, section 3.2, predict deviation), and select a set of candidate areas including the detection target from each of the plurality of input images based on the predicted amount of positional deviation (page 304, section 3.2).

Regarding claim 8 (depends on claim 1), Matei discloses the apparatus wherein each of the plurality of images is captured by a plurality of cameras that correspond to the plurality of respective modals ([0197], RGB camera and TOF camera).

.

Claim 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Xu in view of Matei, and in further view of Kato et al. (US Patent Application Publication 2018/0115711), hereinafter referred as Kato.

Regarding claim 9 (depends on claim 1), Xu in view of Matei fails to explicitly disclose the apparatus wherein each of the plurality of images is captured by one camera which is being moved while switching the plurality of modals at predetermined intervals.
However, in a similar field of endeavor Kato discloses an apparatus for image processing (Fig. 2). In addition, Kato discloses the apparatus wherein each of the plurality of images is captured by one camera which is being moved while switching the plurality of camera at predetermined intervals ([0378]). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the invention of Xu, and each of the plurality of images is captured by one camera which is being moved while switching the plurality of modals at predetermined intervals. The motivation for doing this is that the process and calculation for image processing of Xu can be easier to control so that the application of Xu can be strengthened.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to QIAN YANG whose telephone number is (571)270-7239.  The examiner can normally be reached on Monday-Thursday 8am-6pm.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 571-272-7332.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/QIAN YANG/Primary Examiner, Art Unit 2668