DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Specification
The title of the invention is not descriptive. The title of the application is too broad such that a reader will not obtain any information about the invention from the title. A new title is required that is clearly indicative of the invention to which the claims are directed. 
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-3, 6-7, and 18-20 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Agrawal et al (U.S. 11,423,630 B1).
Regarding claim 1, Agrawal et al teaches a generation apparatus (Figs. 1-3) comprising: 
                 one or more memories storing instructions (Fig. 3, 324, 323); and 
                 one or more processors that, upon executing the instructions (Fig. 3, 326, 323), performs: 
                       obtaining a plurality of images captured by a plurality of image capturing apparatuses from different directions (Fig. 2A, 202-1 – 202-N. column 4, lines 24-28, “In some implementations, multiple 2D body images of a body from different views (e.g., front view, side view, back view, three-quarter view, etc.), such as 2D body images 202-1, 202-2, 202-3, 202-4 through 202-N may be utilized with the disclosed implementations to generate a dimensionally accurate 3D model of the body.”); 
                      specifying an image that is to be used for generating a three-dimensional pose model indicating a plurality of joint positions of an object, from the obtained plurality of images (column, 5, lines 8-13, “Returning to FIG. 2A, the first 2D body image 202-1 is processed to segment a plurality of pixels of the first 2D body image 202-1 that represent the human body from a plurality of pixels of the first 2D body image 202-1 that do not represent the human body, to produce a front silhouette 204-1 of the human body.” Column 6, lines 6-13, “For example, a CNN may be trained to receive features generated from different silhouettes 204 to produce predicted body parameters 207. The predicted body parameters 207 may indicate any aspect or information related to the body 203 represented in the images 202. For example, the predicted body parameters 207 may indicate 3D joint locations, body volume, shape of the body, pose angles, etc.”); and 
                       generating a three-dimensional pose model of the object based on specified image (Fig. 2A, 210, 208; column 6, lines 17-20, “Utilizing the predicted body parameters 207, 3D modeling 210 of the body 203 represented in the 2D body images 202 is performed to generate a 3D model of the body 203 represented in the 2D body images 202.”).
Regarding claim 2, Agrawal et al teaches wherein the one or more processors further execute the instructions to perform obtaining a three-dimensional shape model indicating a three-dimensional shape of the object, based on the plurality of images, wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on the obtained shape model (Column 6, lines 6-13, “For example, a CNN may be trained to receive features generated from different silhouettes 204 to produce predicted body parameters 207. The predicted body parameters 207 may indicate any aspect or information related to the body 203 represented in the images 202. For example, the predicted body parameters 207 may indicate 3D joint locations, body volume, shape of the body, pose angles, etc.”).
Regarding claim 3, Agrawal et al teaches wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on the three-dimensional shape model of the object and a region of the object in each of the plurality of images (column 11, lines 31-42, “Similar to 3D model refinement, the approximate pose of the body in one of the 2D body images 252 may be determined and the 3D model adjusted accordingly so that the texture obtained from that 2D body image 252 may be aligned and used to augment that portion of the 3D model. In some implementations, alignment of the 3D model with the approximate pose of the body 253 may be performed for each 2D body image 252-1 through 252-N so that texture information or data from the different views of the body 253 represented in the different 2D body images 252 may be used to augment the different poses of the resulting 3D model.”).
Regarding claim 6, Agrawal et al teaches wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on a size of a region of the object in each of the plurality of images (column 21, lines 23-28, “If it is determined that there is a difference between the 2D model image and the body represented in one or more of the 2D body images/silhouette, the 3D model and/or the predicted body parameters may be adjusted to correspond to the shape and/or size of body represented in the 2D body image and/or the silhouette, as in 660.”).
Regarding claim 7, Agrawal et al teaches wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on positions of the plurality of image capturing apparatuses (column 2, lines 54-57, “Likewise, the user may be instructed to stand a distance from the camera such that the body of the user is completely included in a field of view of the imaging element and represented in the generated image 102.”).
Regarding claim 18, Agrawal et al teaches wherein the plurality of image capturing apparatuses are image capturing apparatuses that are used for generating a virtual viewpoint image, and in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified from a plurality of images captured and obtained by the plurality of image capturing apparatuses that are used for generating the virtual viewpoint image (column 2, lines 37-62, image capturing; column 11, lines 19-30, “In some implementations, upon completion of 3D model refinement 258, the 3D model of the body represented in the 2D body images 252 may be augmented with one or more textures, texture augmentation 262, determined from one or more of the 2D body images 252-1 through 252-N. For example, the 3D model may be augmented to have a same or similar color to a skin color of the body 253 represented the 2D body images 252, clothing or clothing colors represented in the 2D body images 252 may be used to augment the 3D model, facial features, hair, hair color, etc., of the body 253 represented in the 2D body image 252 may be determined and used to augment the 3D model.”).
Claim 19 recites a method for generating a three-dimensional pose model, the method comprising the steps performed by the processors as in claim 1. Agrawal et al teaches a method for generating a three-dimensional pose model, the method comprising the steps performed by the processors as in claim 1 (see rejection of the claim 1 above.).
Claim 20 recites a non-transitory computer-readable storage medium storing a program for causing a computer to execute a method comprising the steps performed by the processors as in claim 1. Agrawal et al teaches a non-transitory computer-readable storage medium storing a program for causing a computer to execute a method comprising the steps performed by the processors as in claim 1 (see rejection of the claim 1 above.).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 4, 8, and 10 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al, as applied to claims 1 and 3 above, and in view of Popa et al (U.S. Pub. 2014/0219550 A1).
Regarding claim 4, Agrawal et al remains as applied to claim 3 above. However, Agrawal et al does not explicitly teach wherein, in the specifying, priority orders for the plurality of images are set based on the three-dimensional shape model of the object and the region of the object in each of the plurality of images, and an image that is to be used for generating the three-dimensional pose model of the object is specified in accordance with the priority orders.
Popa et al, in the same field of endeavor, teaches wherein, in the specifying, priority orders for the plurality of images are set based on the three-dimensional shape model of the object and the region of the object in each of the plurality of images, and an image that is to be used for generating the three-dimensional pose model of the object is specified in accordance with the priority orders (paragraphs [0031], [0035], "selecting, from the set of sequences of reference silhouettes, a predetermined number of sequences which have the smallest sequence matching errors;" "The matching error, in an embodiment, is adapted or weighted according to a confidence value which indicates the quality of the source image segment. A source image segment that is known or likely to comprise overlapping real world objects is assigned a lower confidence value than one that comprises just one object." Note: The sequence matching errors are mapped to the priority orders. Smaller sequence matching errors correspond to higher priority orders.). As Popa et al is combined with Agrawal et al, e.g., selecting/specifying the source images using certain priority orders, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the apparatuses as shown in Agrawal et al and Popa et al to obtain the claimed features.
Regarding claim 8, the combination of Agrawal et al and Popa et al would suggest the apparatus according to claim 1, wherein the one or more processors further execute the instructions to perform determining whether or not at least a portion of a region of the object in each of the plurality of images is occluded by a region of another object from among a plurality of objects, wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on a result of the determination made by the determining unit (Popa et al: paragraph [0135], “In more detail, this is done by determining the optical flow between two successive images, and determining therefrom, for one or more bones or joints, their expected position. This can be done for all joints, or only for joints that occlude (or are occluded by) other body parts. For example, given the position of joints (or bone positions and orientations) in one frame, optical flow to an adjacent frame is used to compute expected positions in the adjacent frame.”). The rationale of the combination for claim 4 above is incorporated herein.
Regarding claim 10, the combination of Agrawal et al and Popa et al would suggest the apparatus according to claim 1, wherein the one or more processors further execute the instructions to perform acquiring a reliability indicating accuracy of the object regarding the plurality of images, wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on the reliability obtained by the reliability obtaining unit (Popa et al: paragraph [0035], "The matching error, in an embodiment, is adapted or weighted according to a confidence value which indicates the quality of the source image segment. A source image segment that is known or likely to comprise overlapping real world objects is assigned a lower confidence value than one that comprises just one object." Note: the confidence value is mapped to the reliability in the claim.). The rationale of the combination for claim 4 above is incorporated herein.
Claim 5 is rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al, as applied to claim 1 above, and in view of Yomdin et al (U.S. Pub. 2012/0218262 A1).
Regarding claim 5, Agrawal et al remains as applied to claim 1 above. However, Agrawal et al does not explicitly teach wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on an image resolution of a region of the object in each of the plurality of images.
Yomdin et al, in the same field of endeavor, teaches wherein, in the specifying, an image that is to be used for generating the three-dimensional pose model of the object is specified based on an image resolution of a region of the object in each of the plurality of images (paragraph [0161], “The fitting and transformations of HRF combined models may be performed generally as described hereinabove, with the following distinction: the fitting of the HRF sub-models may be performed with a higher resolution than the rest of the character. For example, the fitting of general character model may be performed with the image of a certain specific resolution. The fitting of a component face HRF sub-model may be performed with the appropriate part of the image taken with two or three times finer resolution.”). As Yomdin et al is combined with Agrawal et al, e.g., selecting/specifying the source images with the consideration of the image resolutions, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the apparatuses as shown in Agrawal et al and Yomdin et al to obtain the claimed features.
Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al, and in view of Popa et al, as applied to claim 8 above, and further in view of Park et al (U.S. Pub. 2016/0224856 A1).
Regarding claim 9, the combination of Agrawal et al and Popa et al remains as applied to claim 8 above. However, the combination does not show wherein in the determining, it is determined whether or not at least a portion of the region of the object is occluded by the region of the other object, based on a distance between the object and an image capturing apparatus that captures an image of the object from among the plurality of image capturing apparatuses, and the distance between the image capturing apparatus and the other object.
Park et al, also in the same field of endeavor, teaches wherein in the determining, it is determined whether or not at least a portion of the region of the object is occluded by the region of the other object, based on a distance between the object and an image capturing apparatus that captures an image of the object from among the plurality of image capturing apparatuses, and the distance between the image capturing apparatus and the other object. (paragraphs [0060], [0076], “For example, OHCV may create a depth image in which the 3D map points are represented as small circles with intensity of the circles representing distance from the camera viewpoint to the point within an environment. Occluded map points may be detected by comparing a 3D map point's distance and the depth from the depth map. For example, keyframe points observable from the camera viewpoint projected onto a same 3D map point and a respective 3D map distance closest to the camera viewpoint is stored in the depth map. As a result of comparing distance between two points, a keyframe point associated with a farther distance (e.g., largest distance) from a camera viewpoint is determined as occluded.”). As Park et al is combined with Agrawal et al and Popa et al, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the apparatuses as shown in Agrawal et al, Popa et al, and Park et al to obtain the claimed features.
Claims 11-14 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al, and in view of Popa et al, as applied to claim 10 above, and further in view of Socek et al (U.S. 10,491,895 B2).
Regarding claim 11, the combination of Agrawal et al and Popa et al remains as applied to claim 10 above. However, the combination does not show wherein, in the obtaining, an image indicating a region of the object is obtained for each of the plurality of captured images based on a probability value indicating whether or not each pixel in the captured image is a pixel that constitutes the region of the object.
Socek et al, also in the same field of endeavor, teaches wherein, in the obtaining, an image indicating a region of the object is obtained for each of the plurality of captured images based on a probability value indicating whether or not each pixel in the captured image is a pixel that constitutes the region of the object (column 5, lines 48-57, “Furthermore, in motion skin pixels may be detected by motion detection module 104. For example, in motion pixels of input image 102 may be detected using a motion detection algorithm and detected pixels may be filtered using the global skin detector. Motion detection module 104 may thereby output those pixels that have a higher probability of belonging to skin regions of the frame. Pixels that were considered as moving pixels belonging to the skin area may then be used for retraining the detector.”). As Socek et al is combined with Agrawal et al and Popa et al, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the apparatuses as shown in Agrawal et al, Popa et al, and Socek et al to obtain the claimed features.
Regarding claim 12, the combination of Agrawal et al, Popa et al, and Socek et al would suggest the apparatus according to claim 11, wherein, in the obtaining, the probability value of each pixel in the captured images is obtained based on a result obtained through learning (Socek et al : column 5, lines 44-57, training, retraining).
Regarding claim 13, the combination of Agrawal et al, Popa et al, and Socek et al would suggest the apparatus according to claim 11, wherein in the acquiring, the reliability for an image indicating the region of the object is acquired based on the probability value (Socek et al : column 18, lines 64-67, “For example, face validator 1206 (e.g., a current stage) may use initial skin probability map 1214 (e.g., from previous stage) to measure the reliability of each face region being validated.”).
Regarding claim 14, the combination of Agrawal et al, Popa et al, and Socek et al would suggest the apparatus according to claim 13, wherein, in the acquiring the reliability is set such that, in a distribution of the probability value, the higher a ratio of probability values that are close to a minimum value and a maximum value that a probability value possibly takes is, the higher a value of the reliability becomes (Socek et al : column 18, line 64-column 19, line 2, “For example, face validator 1206 (e.g., a current stage) may use initial skin probability map 1214 (e.g., from previous stage) to measure the reliability of each face region being validated. For example, a face area may be validated only if the average skin percentage on a per pixel bais (e.g., from the region) exceeds 40%.” Also see column 24, line 55 – column 25, line 22.).
Claims 15-17 are rejected under 35 U.S.C. 103 as being unpatentable over Agrawal et al, and in view of Popa et al, as applied to claim 10 above, and further in view of Nakano et al (U.S. Pub. 2018/0285643 A1).
Regarding claim 15, the combination of Agrawal et al and Popa et al remains as applied to claim 10 above. However, the combination does not show wherein, in the obtaining, an image indicating the object is obtained based on a difference value between a pixel value of the captured image and a pixel value of a background image in which the object is not present in a shooting region corresponding to a captured image, for each of a plurality of captured images.
Nakano et al, also in the same field of endeavor, teaches wherein, in the obtaining, an image indicating the object is obtained based on a difference value between a pixel value of the captured image and a pixel value of a background image in which the object is not present in a shooting region corresponding to a captured image, for each of a plurality of captured images (paragraph [0059], “In the detection of feature points, a point considered as an image feature (a key point) is determined from a difference between smoothed images with different scales. Then, information is described using the gradient information of a surrounding image around each key point. Next, by calculating a difference between the scales, a position of appearance of a change in the image (a boundary between an object and a background or the like) is calculated. A point at which this change is maximized is a candidate for a feature point (a key point) of the SIFT. In order to retrieve this point, differential images are arranged and extreme values are retrieved.”). As Nakano et al is combined with Agrawal et al and Popa et al, one would obtain the claimed features. The implementation of the combination may be done by adding/modifying the relevant software components. The rationale of the combination may be combining prior art elements according to known methods to yield predictable results, see MPEP 2143. Therefore it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to combine the apparatuses as shown in Agrawal et al, Popa et al, and Nakano et al to obtain the claimed features.
Regarding claim 16, the combination of Agrawal et al, Popa et al, and Nakano et al would suggest the apparatus according to claim 15, wherein, in the acquiring, the reliability for an image indicating the region of the object is acquired based on the difference value (Nakano et al : paragraph [0048], “Alternatively, the image likelihood calculation unit 112 calculates an image likelihood Lv(v;oi) for each candidate using the calculated image feature quantity and the image models authenticated by the DNN from the image model DB 107, for example, the HMM. Also, the image likelihood Lv(v;oi) is obtained by calculating a posterior probability p(oi|v). Here, v is an image feature quantity, and oi is an image model of an ith object output by the image model generation unit 108. Also, the image likelihood Lv is a value from 0 to 1. It is indicated that a likelihood difference is larger with respect to a contention candidate and the reliability is higher when the image likelihood Lv is closer to 1. Also, it is indicated that the reliability is lower when the image likelihood Lv is closer to 0.”).
Regarding claim 17, the combination of Agrawal et al, Popa et al, and Nakano et al would suggest the apparatus according to claim 16, wherein, in the acquiring, the reliability is acquired such that the larger a value of the difference value in the captured image is, the larger a value of the reliability becomes (Nakano et al : paragraph [0048], “Alternatively, the image likelihood calculation unit 112 calculates an image likelihood Lv(v;oi) for each candidate using the calculated image feature quantity and the image models authenticated by the DNN from the image model DB 107, for example, the HMM. Also, the image likelihood Lv(v;oi) is obtained by calculating a posterior probability p(oi|v). Here, v is an image feature quantity, and oi is an image model of an ith object output by the image model generation unit 108. Also, the image likelihood Lv is a value from 0 to 1. It is indicated that a likelihood difference is larger with respect to a contention candidate and the reliability is higher when the image likelihood Lv is closer to 1. Also, it is indicated that the reliability is lower when the image likelihood Lv is closer to 0.”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TIZE MA whose telephone number is (571)270-3709. The examiner can normally be reached 9AM-5PM EST M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Xiao Wu can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/TIZE MA/Primary Examiner, Art Unit 2613