DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the correspondence filled on 3/31/20.
Claims 1-21 are presented for examination.

IDS Considerations

The information disclosure statement (IDS) submitted on 3/31/20 is/are being considered by the examiner as the submission is in compliance with the provisions of 37 CFR 1.97.

Examiner’s Note: Specification on 5/3/21 has been entered.

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1, 6-8, 13-15 and 20-21 is/are rejected under 35 U.S.C. 103 as being unpatentable over Murase (U.S. Pub. No. 20200005480 A1), in view of Lv (U.S. Pub. No. 20190057509 A1).

Regarding to claim 1, 8 and 15:

1. Murase teach a method, comprising: defining a geometric capsule (Murase [0013] acquiring an image of the object at the first viewpoint; extracting an image feature of the image of the object from the acquired image of the object; calculating a first likelihood map indicating a relation between the estimated pose of the object estimated from the image of the object and a likelihood of this estimated pose based on the extracted image feature of the object) that is interpretable by a capsule neural network, (Murase Fig. 1 [0032] The first deep layer learning unit 3 is a specific example of first learning means. The first and second statistical processing units 4 and 5 are specific examples of second and third learning means, respectively. The first deep layer learning unit 3 is composed of a neural network such as CNN (Convolutional Neural Network), RNN (Recurrent Neural Network), etc. This RNN includes an LSTM (Long Short Term Memory) as an intermediate layer. [0068] The first deep layer learning unit 3 learns the image feature of each pose of the object using a plurality of sets of images of the object and learning data of the pose of the object (Step S101)) wherein the geometric capsule (Murase [0013] acquiring an image of the object at the first viewpoint; extracting an image feature of the image of the object from the acquired image of the object; calculating a first likelihood map indicating a relation between the estimated pose of the object estimated from the image of the object and a likelihood of this estimated pose based on the extracted image feature of the object) includes a feature representation and a pose; (Murase Fig. 8 [0072] the image feature extraction unit 7 extracts the image feature of the object from the image of the object output from the image acquisition unit 2 using the filter model stored in the storage unit 6 (Step S105). The image feature extraction unit 7 outputs the extracted image feature of the object to the pose estimation unit 8. [0073] The pose estimation unit 8 compares the image feature of the object extracted by the image feature extraction unit 7 with the image feature of each pose learned in advance by the first deep learning unit 3, and calculates the likelihood of the pose of the object to calculate the estimated pose likelihood map (Step S106). The pose estimation unit 8 outputs the calculated estimated pose likelihood map to the viewpoint estimation unit 9)
determining multiple viewpoints relative to the geometric capsule; (Murase [0014] a viewpoint recommendation apparatus for estimating, from a first viewpoint of an object, a second viewpoint at which the object is to be observed next in order to estimate a pose of the object. The program causes a computer to execute: a process of acquiring an image of the object at the first viewpoint; a process of extracting an image feature of the image of the object from the acquired image of the object; a process of calculating a first likelihood map indicating a relation between the estimated pose of the object estimated from the image of the object and a likelihood of this estimated pose based on the extracted image feature of the object [geometric capsule]; and a process of estimating the second viewpoint so that a value of an evaluation function becomes the maximum or minimum)
determining a first appearance representation of the geometric capsule (Murase [0036] note that the pose estimation unit 8 compares the image feature of the object extracted by the image feature extraction unit 7 with the image feature of each pose of the object stored in advance in the storage unit 6 to calculate the estimated pose likelihood map) for each of the multiple viewpoints; (Murase [0038] FIG. 2 is a view showing an example of the estimated pose likelihood map. In FIG. 2, the horizontal axis represents the estimated pose ξ of the object, and the vertical axis the likelihood p(ξ|I.sub.1) of the estimated pose. The likelihood p(ξ|I.sub.1) indicates a likelihood distribution of the estimated pose ξ of the object estimated from the image I.sub.1 of the object. When the pose estimation unit 8 is configured to perform, for example, a regression analysis, the estimated pose likelihood map has a distribution as shown in FIG. 2 in which there is only one point of 1, and the rest of the points are 0. The pose estimation unit 8 outputs the calculated estimated pose likelihood map to the viewpoint estimation unit 9)
determining second appearance representations that each correspond to one of the transformed viewpoints; (Murase [0047] the viewpoint estimation unit 9 uses the above formula (3) to estimate the second viewpoint δ.sub.2 (hat) such that the value of the evaluation function g(p(θ|δ.sub.2, I.sub.1)) becomes the minimum and the kurtosis of the pose likelihood map becomes the maximum. [0048] The viewpoint estimation unit 9 may estimate at least one second viewpoint δ.sub.2 (hat) such that the value of the evaluation function g(p(θ|δ.sub.2, I.sub.1)) becomes less than or equal to a threshold using the above formula (3). Then, at least one second viewpoint δ.sub.2 at which the likelihood of the pose of the object becomes high can be estimated. [0050] Here, latent variables are introduced in p(θ|δ.sub.2, I.sub.1) in the above formula (1), and the formula (1) is transformed by multiplying and integrating p(θ|δ.sub.2, I.sub.1), p(φ.sub.1|ξ), and p(ξ|I.sub.1) like in the following formula (4))
combining the second appearance representations to define an agreed appearance representation; (Murase [0036] the pose estimation unit 8 is a specific example of pose estimation means. The pose estimation unit 8 compares the image feature of the object extracted by the image feature extraction unit 7 with the image feature (the pose template) of each pose learned in advance by the first deep learning [learning process involves combining representations from the past for refined agreed representation] unit 3 to calculate a likelihood of the pose of the object. Then, the pose estimation unit 8 calculates an estimated pose likelihood map with three axes (x axis, y axis, and z axis) of the pose of the object. Note that the pose estimation unit 8 compares the image feature of the object extracted by the image feature extraction unit 7 with the image feature of each pose of the object stored in advance in the storage unit 6 to calculate the estimated pose likelihood map)

Murase do not explicitly teach determining a transform for each of the multiple viewpoints that moves each of the multiple viewpoints to a respective transformed viewpoint; and updating the feature representation for the geometric capsule based on the agreed appearance representation.

However Lv teach determining a transform for each of the multiple viewpoints that moves each of the multiple viewpoints to a respective transformed viewpoint; (Lv Fig. 1 [0044] to learn the rigid regions of two viewpoints, during training, the rigidity transform neural network model 210 is forced to capture both scene structures and epipolar constraints w.r.t. two viewpoints. First, the rigidity transform neural network model 210 is fully convolutional and the viewpoint pose is regressed at the input to the SAP layer 250 to preserve feature distributions spatially. Importantly, features for rigidity segmentation and pose regression can interact directly with each other spatially across each feature map. [0047] at step 224, the features are processed by the rigidity transform neural network model 210 to predict the rigidity mask. In an embodiment the features may also be processed to generate the viewpoint transform)
and updating the feature representation for the geometric capsule based on the agreed appearance representation. (Lv [0036] the relative viewpoint pose is further refined [agreed appearance representation] by the refinement unit 220 to improve accuracy of the viewpoint transformation based on the 2D optical flow and the rigidity mask [geometric capsule, because Lv para 0044], producing a refined relative viewpoint pose. In an embodiment, the 2D optical flow constrains refinement of the relative viewpoint pose. The relative viewpoint pose generated by the rigidity transform neural network model 210 may not always precisely generalize to new scenes)



It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Murase, further incorporating Lv in video/camera technology. One would be motivated to do so, to incorporate determining a transform for each of the multiple viewpoints that moves each of the multiple viewpoints to a respective transformed viewpoint. This functionality will improve user experience.

Regarding to claim 6, 13 and 20:

6. Murase teach the method of claim 1, Murase do not explicitly teach wherein determining the transform for each of the multiple viewpoints is performed using a trained neural network.

However Lv teach wherein determining the transform for each of the multiple viewpoints (Lv [0036] the relative viewpoint pose is further refined by the refinement unit 220 to improve accuracy of the viewpoint transformation based on the 2D optical flow and the rigidity mask, producing a refined relative viewpoint pose) is performed using a trained neural network. (Lv Fig. 1 Fig. 2A, Fig. 2D [0044] to learn the rigid regions of two viewpoints, during training, the rigidity transform neural network model 210 is forced to capture both scene structures and epipolar constraints w.r.t. two viewpoints. First, the rigidity transform neural network model 210 is fully convolutional and the viewpoint pose is regressed at the input to the SAP layer 250 to preserve feature distributions spatially. Importantly, features for rigidity segmentation and pose regression can interact directly with each other spatially across each feature map)

Regarding to claim 7, 14 and 21:

7. Murase teach the method of claim 6, Murase do not explicitly teach wherein the trained neural network is configured to determine the transform for each of the multiple viewpoints such that the second appearance representations are constrained to match.

However Lv teach wherein the trained neural network is configured to determine the transform for each of the multiple viewpoints (Lv [0036] the relative viewpoint pose is further refined by the refinement unit 220 to improve accuracy of the viewpoint transformation based on the 2D optical flow and the rigidity mask, producing a refined relative viewpoint pose) such that the second appearance representations are constrained to match. (Lv [0048] the rigidity transform neural network model 210 is deemed to be sufficiently trained when the predicted rigidity masks generated for the sequence of images [second appearance representations] from the training [constrained] dataset match the target rigidity masks or a threshold accuracy is achieved for the training dataset)

Claims 2-5, 9-12 and 16-19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Murase (U.S. Pub. No. 20200005480 A1), in view of Lv (U.S. Pub. No. 20190057509 A1), further in view of Sun (U.S. Pub. No. 20190130603 A1).

Regarding to claim 2, 9 and 16:

2. Murase teach the method of claim 1, Murase do not explicitly teach wherein defining the geometric capsule includes: receiving a group of elements that represent a three-dimensional scene as an input, identifying sampled elements from the group of elements, and assigning the sampled elements to the geometric capsule.

However Sun teach wherein defining the geometric capsule includes:
receiving a group of elements that represent a three-dimensional scene as an input, (Sun [0011] In certain example embodiments, during an offline training phase, the 3D CAD data may be used to generate 2.5D synthetic image data representative of different pose estimations that simulate viewpoints of an observer of an object represented by the 3D CAD data from different positions and orientations. A mapper may then map the set of pose estimations to corresponding feature representations such as feature vectors)
identifying sampled elements from the group of elements, (Sun [0027] in order to determine positive and negative samples (e.g., p.sub.i.sub._.sub.positive and p.sub.i.sub._.sub.negative for a given p.sub.i), a 2D label map may be rendered from the 3D CAD data for each pose estimation/camera pose) and assigning the sampled elements (Sun [0027] in order to determine positive and negative samples (e.g., p.sub.i.sub._.sub.positive and p.sub.i.sub._.sub.negative for a given p.sub.i), a 2D label map may be rendered from the 3D CAD data for each pose estimation/camera pose) to the geometric capsule. (Sun [0027] the discriminative nature of a feature representation [geometric capsule] (its ability to uniquely identify a pose estimation/ camera pose and distinguish it from other pose estimation/camera poses) may depend on the triplets that are selected for the CNN 204)

The motivation for combining Murase and Lv as set forth in claim 1 is equally applicable to claim 2. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Murase, further incorporating Lv and Sun in video/camera technology. One would be motivated to do so, to incorporate receiving a group of elements that represent a three-dimensional scene as an input. This functionality will accommodate 3D capability.

Regarding to claim 3, 10 and 17:

3. Murase teach the method of claim 2, Murase do not explicitly teach wherein defining the geometric capsule includes initializing the feature representation and the pose for the geometric capsule based on the sampled elements.

However Lv teach wherein defining the geometric capsule includes initializing the feature representation (Lv [0036] the relative viewpoint pose [geometric capsule] is further refined by the refinement unit 220 to improve accuracy of the viewpoint transformation based on the 2D optical flow and the rigidity mask, producing a refined relative viewpoint pose. In an embodiment, the 2D optical flow constrains refinement of the relative viewpoint pose. The relative viewpoint pose generated by the rigidity transform neural network model 210 may not always precisely generalize to new scenes. The refinement unit 220 may be configured to modify the viewpoint pose based on the estimated rigidity B and bidirectional dense optical flow δu.sub.0.fwdarw.1.sup.of and δu.sub.1.fwdarw.0.sup.of. Estimation of C.sub.1 may be viewed as a robust least square problem:
[00001] 
    PNG
    media_image1.png
    78
    449
    media_image1.png
    Greyscale

where x.sub.i=π.sup.−1(u.sub.i, z.sub.i) in all background regions B, predicted by the rigidity transform neural network model 210. [I] is an Iverson bracket for all the inlier correspondences. In an embodiment, the following technique is used to refine the viewpoint pose by filtering the inlier correspondences in several steps. … Equation (4) may be solved efficiently via Gauss-Newton with C.sub.1 initialized as the viewpoint pose output by the rigidity transform [feature representation because Lv para 47] neural network model 210. With accurate filtered correspondences [feature representation], the initialization step trivially helps but can also be replaced by an identity initialization. [0047] at step 224, the features are processed by the rigidity transform neural network model 210 to predict the rigidity mask. In an embodiment the features may also be processed to generate the viewpoint transform)

However Sun teach and the pose for the geometric capsule based on the sampled elements. (Sun [0027] in order to determine positive and negative samples (e.g., p.sub.i.sub._ .sub.positive and p.sub.i.sub._.sub.negative for a given p.sub.i), a 2D label map may be rendered from the 3D CAD data for each pose estimation/camera pose. Sun [0027] the discriminative nature of a feature representation [geometric capsule] (its ability to uniquely identify a pose estimation/ camera pose and distinguish it from other pose estimation/camera poses) may depend on the triplets that are selected for the CNN 204)

Regarding to claim 4, 11 and 18:

4. Murase teach the method of claim 2, Murase do not explicitly teach wherein the group of elements is a point cloud and the elements from the group of elements are points that are included in the point cloud.

However Sun teach wherein the group of elements is a point cloud and the elements from the group of elements are points that are included in the point cloud. (Sun [0016] a 3D point cloud may be reconstructed from a depth image [group of elements] to derive a representation from the point cloud such as a point feature histogram)

Regarding to claim 5, 12 and 19:

5. Murase teach the method of claim 2, Murase do not explicitly teach wherein the group of elements is a group of lower-level geometric capsules.

However Sun teach wherein the group of elements is a group of lower-level geometric capsules. (Sun [0030] once the CNN 204 is trained, the set of feature representations obtained from the depth image data [group of elements] representative of the set of pose estimations/camera poses [geometric capsules] 202(1)-202(N) may be stored in one or more datastores 208 at block 408 of the method 400. In particular, the set of pose estimations 202(1)-202(N), or more specifically the 2.5D image data indicative of the set of pose estimations 202(1)-202(N), may be stored in the datastore(s) 208 in association with the corresponding feature representations as pose estimation and feature representation pairings 206(1)-206(N) [lower-level geometric capsules, because it is related to 2.5D image instead of 3D])

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NASIM N NIRJHAR whose telephone number is (571) 272-3792.  The examiner can normally be reached on Monday - Friday, 8 am to 5 pm ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Kelley can be reached on (571) 272-7331.  The fax phone number for the organization where this application or proceeding is assigned is (571) 273-8300.

Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/NASIM N NIRJHAR/Primary Examiner, Art Unit 2482