Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION

Status of Claims
Claims 1-17 are currently pending in this application.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on March 4, 2021 is hereby acknowledged.  All references have been considered by the examiner. Initialed copies of the PTO-1449 are included in this correspondence.

Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

FP 7.30.06
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: “Column 2 of Table 1” in claim “Column 5 of Table 1” with generic placeholder “Column 3 of Table 1”.

Claim limitation
Generic placeholder
Functional language
Claim number(s)
1
an input
unit
to obtain …
12
2
a point cloud
generator
to receive … and to generate …
12
3
a morphing
unit
to adjust one or more parameters …
12
4
an image
generator
to generate …
12-14 and 16
5
a discount
unit
to identify points … and to discount …
13
6
a texture map
generator
to generate a texture map …
15

Table 1

Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claims 1-16 are rejected under 35 U.S.C. 103 as being unpatentable over Black et al. (2010/0111370) in view of Bogo (“Detailed Full-Body Reconstructions of Moving People from Monocular RGB-D Sequences”, IEEE International Conference on Computer Vision, 9 pages, December 7, 2015; IDS).

Regarding claim 1, Black teaches a method of generating a three-dimensional (3D) reconstruction of a human (e.g., The present invention relates to the estimation of human body shape using a low-dimensional 3D model using sensor data and other forms of input data that may be imprecise, ambiguous or partially obscured.  Black: [0003]. Synthetic body models can be generated using specialized commercial software tools (e.g. 3D Studio Max, BodyBuilder, Maya, Poser). The shape is controlled though a number of parameters while pose is varied by associating the surface mesh with a kinematic skeleton.  Black: [0120] L.3-7), the method comprising: 
obtaining at least one colour image and corresponding depth image of a scene, the at least one colour image and corresponding depth image comprising a human subject that is at least partially occluded by one or more items (e.g., The above fitting can be performed with people wearing minimal clothing (e.g. underwear or tights) or wearing standard street clothing. In either case, multiple body poses may be combined to improve the shape estimate. This exploits the fact that human body shape (e.g. limb lengths, weight, etc.) is constant even though the pose of the body may change. In the case of a clothed subject, we use a clothing-insensitive (that is, robust to the presence of clothing) cost function. This captures the fact that regions corresponding to the body in the frames (images or depth data) are generally larger for people in clothes and makes the shape fitting sensitive to this fact.  Black: [0058] L.1-12.  It is assumed that RGB (red, green, blue) input pixels {ri, gi, bi} [Symbol font/0xCE] I in the input image I are constrained to the range [0,1] by the sensor. Black: [0086] L.2-5. Combining measurements from multiple poses is particularly useful for clothed people because, in each pose, the clothing fits the body differently, providing different constraints on the underlying shape. Additionally, the optional skin detection component within the calibration and data pre-processing system 104 is used to modify the cost function in non-skin regions. In these regions the body shape does not have to match the image measurements exactly.  Black: [0058] L.12-19.  Therefore, non-skin regions (portions of body under clothing) are occluded regions); 
identifying in the at least one colour image, regions corresponding to the non-occluded parts of the human subject (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Black: [0051] L.1-4.  Regions corresponding to skin are non-occluded regions or parts of body); 
generating a point cloud of the scene based on the at least one colour image and corresponding depth image of the scene (e.g., In contrast to image observations that provide constraints in 2D, there exist sensors that capture depth measurements directly in 3D (e.g. sparse or dense stereo images, laser range scans, structured light scans, time-of-flight sensors). Having 3D measurements simplifies the matching problem with a 3D body model. These measurements may consist of point clouds or polygonal meshes, and optionally contain color information or surface orientation.  Black: [0184]), the point cloud comprising regions corresponding to the regions identified in the at least one colour image as corresponding to the non-occluded parts of the human subject (e.g., Many range scanning devices simultaneously acquire visible imagery, which either provides a texture map or per-vertex coloration for the range data. This allows the classification of sensor data points as either skin or clothing using the skin classifier described in Section 2e (or more generally to classify each as corresponding to one of G classes using user input or skin/hair/clothing classifiers described in the literature (Section 7b)). Black: [0373] L.3-10); 
adjusting one or more parameters of a parametric model based on the regions of the point cloud (e.g., To recover body shape from standard sensors in less constrained environments and under clothing, a parametric 3D model of the human body is employed. The term "body shape" means a pose independent representation that characterizes the fixed skeletal structure (e.g. length of the bones) and the distribution of soft tissue (muscle and fat). The phrase "parametric model" refers any 3D body model where the shape and pose of the body are determined by a few parameters. A graphics model is used that is represented as a triangulated mesh (other types of explicit meshes are possible such as quadrilateral meshes as are implicit surface models such as NURBS). A key property of any parametric model is that it be low dimensional--that is, a wide range of body shapes and sizes can be expressed by a small number of parameters. Black: [0020] L.1-14.  One embodiment fits body pose and shape to this data using an Iterative Closest Point (ICP) strategy. Generic ICP is a well understood algorithm used for aligning two point clouds. Broadly speaking, the algorithm establishes point correspondences between the source shape (body model) and the target shape (3D sensor measurements), defines an error function that encourages established corresponding points to be aligned, computes the optimal parameters that minimize the error, transforms the source shape using the optimal parameters and iterates to establish new point correspondences and refine the alignment. Black: [0185].  Therefore, the variation (adjustment) of the small number of parameters define different 3D body models) corresponding to the non-occluded parts of the human subject (e.g., Estimating body shape and pose is challenging in part due to the high dimensional nature of the problem. Body pose may be described by approximately 40 parameters while shape may be described by 20-100 or more.  Black: [0132] L.1-4), the adjusted parametric model providing an estimate of points in the point cloud corresponding to the at least partially occluded parts of the human subject (e.g., At a given ICP iteration, let VS be the set of body model vertices whose closest match on the target shape T was classified as skin, and V\VS the non-skin vertices. For the skin regions, the same error function is used as defined in Section 5b, fully enforcing the tightness constraint, while for the non-skin regions, their contribution is down-weighted through C:

    PNG
    media_image1.png
    127
    524
    media_image1.png
    Greyscale

); 
wherein the parametric model defines a 3D parametrised shape of a human (e.g., With a low-dimensional model, only a few parameters need to be estimated to represent body shape. This simplifies the estimation problem and means that accurate measurements can be obtained even with noisy, limited or ambiguous sensor measurements. Also, because a parametric model is being fitted, the model can cope with missing data. While traditional scanners often produce 3D meshes with holes, the presently disclosed approach cannot generate models with holes and there is no need to densely measure locations on the body to fit the 3D model. Only a relatively small number of fairly weak measurements are needed to fit the model and the recovered shape parameters explain any missing data. Black: [0021]. In one embodiment, a parametric 3D body model called SCAPE (Anguelov et al., 2005) is employed. SCAPE is a deformable, triangulated mesh model of the human body that accounts for different body shapes, different poses, and non-rigid deformations due to articulation. For vision applications, it offers realism while remaining relatively low dimensional. It also factors changes in body shape due to identity and changes due to pose. Black: [0119]) and wherein adjusting the one or more parameters is such that the model is morphed so as to more closely correspond to the human subject (e.g., It has been observed that SCAPE has many desirable properties but other deformable graphics models exist in the literature. Synthetic body models can be generated using specialized commercial software tools (e.g. 3D Studio Max, BodyBuilder, Maya, Poser). The shape is controlled though a number of parameters while pose is varied by associating the surface mesh with a kinematic skeleton. While such models are easy to animate, and allow for pose and shape to be altered independently, the resulting shapes often lack realism. Black: [0120]); and 
generating a 3D reconstruction of the human subject based on the adjusted parametric model (e.g., generating an estimation of one or more of object shape and pose utilizing the intrinsic or extrinsic parameters and the segmented foreground image obtained in the first and second processing steps, respectively; Black: Claim 39 L.1-5. See 1_1 below).
While Black does not explicitly teach, Bogo teaches:
(1_1). generating a 3D reconstruction of the human subject based on the adjusted parametric model (e.g., Input data. We use a Kinect One, which provides 512×424 depth images and 1920 × 1080 RGB images, at 30fps. We compute depth and RGB camera calibration parameters us-ing a customized version of [3]. For each frame t, the sensor produces a depth image Zt and a RGB image It. Given the camera calibration, we process Zt to obtain a point cloud, Pt, with one 3D point per depth pixel. For each sequence, we acquire a background shot. We denote the background point cloud and color image by Pbg and Ibg, respectively. Bogo: sec. 4 para. 1.  Stage 1 – Pose and shape estimation in low-dimensional space. Stage 1 subdivides the initial sequence, of length n, into short intervals of n’ = 3 consecutive frames and estimates the body shape and pose in each interval in a coarse-to-fine manner. Given an interval extending from frame t to frame t’ = t + n’ − 1, we solve for the pose parameters for each frame {θi}t’i=t, and the shape vector  t minimizing:  

    PNG
    media_image2.png
    132
    471
    media_image2.png
    Greyscale

where we first set j = 2 and solve for the shape S2(t), which is approximated with 10 principal components.  Bogo: sec. 4 para. 2.  After solving for  t and the poses for the low-resolution model, we use them as initialization and minimize (3) at resolution 1. See Fig. 3 (b) and (c).  Bogo: sec. 4 para. 4.  We minimize (3) for each frame in the sequence, starting from the first frame and proceeding sequentially with overlapping intervals, initializing each interval with the values optimized for the previous one. This gives a body shape βt and three estimates of the pose at nearly every frame. To output a single body shape from stage 1, we average the shape coefficients of the high-resolution models (Fig. 3). We similarly average the three estimated poses at each frame; this works well since the estimates tend to be very similar. Bogo: sec. 4 para. 5 and Fig. 3; reproduced below for reference.

    PNG
    media_image3.png
    288
    507
    media_image3.png
    Greyscale

The method then uses geometry and image texture over time to obtain accurate shape, pose, and appearance information despite unconstrained motion, partial views, varying resolution, occlusion, and soft tissue deformation. Our novel body model has variable shape detail, allowing it to capture faces with a high-resolution deformable head model and body shape with lower-resolution. Finally we combine range data from an entire sequence to estimate a high-resolution displacement map that captures fine shape details. Bogo: Abstract L.5-14).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Bogo into the teaching of Black so that high-resolution 3D shape and appearance of the human body can be obtained from monocular RGB-D sequences acquired with a single sensor (Bogo: sec. 6 para. 1 L.1-4).

Regarding claim 2, the combined teaching of Black and Bogo teaches a method according to claim 1, comprising the steps of: 
identifying in the at least one colour image regions corresponding to one or more items occluding the human subject (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions.  Black: [0051] L.1-6.  With reference to skin, clothing and hair are item occluding the human object (skin)); 
discounting, from the point cloud, points corresponding to the one or more items identified as occluding the human subject (e.g., For manual segmentation, the images are presented to the user on a display device and the user can either drag a rectangle over the region containing the body, or can click on a few points which are used to obtain a rough body model using the method described in Section 4 from which a tri-map is extracted as described in Section 2d. In either case this is used as input to guide an image based segmentation algorithm 1204, for example, based on graph cuts. In the case that the user is clothed, the image is segmented into three regions: skin, clothing/hair regions, and background. If the user is wearing tight-fitting clothing, then the image may be segmented into only foreground and background. Black: [0362]L.3-13.  Therefore, when skin region is considered, the clothing/hair regions are selected for not considered for further processing).

Regarding claim 3, the combined teaching of Black and Bogo teaches a method according to claim 2, wherein the one or more items identified in the colour image correspond to at least one of: 
i. clothing that is being worn by the human subject (e.g., when a person is wearing clothing that obscures their underlying body shape.  Black: [0007] L.3-4);
ii. objects being held by the human subject; and
iii. hair on the human subject's face and or body.

Regarding claim 4, the combined teaching of Black and Bogo teaches a method according to claim 2, comprising: 
generating a 3D representation of at least one of the identified items (e.g., Here an observation model is defined that deals with clothing robustly using the concept that silhouettes in 2D, and range data in 3D, represent bounds on the underlying body shape. Consequently the true body should fit "inside" the image measurements. In the case of a clothed person, the observations may only provide loose bounds on body shape. Black: [0213] L.1-6); and 
combining the 3D representation of the at least one identified item with the 3D reconstruction of the human subject (e.g., Case 2: Clothing that obscures the body. Often it is desirable to know the shape of a person without having to have them undress or wear tight fitting clothing. Here any single pose of the body does not reveal the entire shape. This is true whether the sensor data is images or more detailed 3D data (e.g. from a laser range scanner, time of flight sensor, or structured light system). Here it is noted that as a person moves in their clothes, the way the clothes obscure the body changes--they become loose or tight on different parts of the body in different poses. By combining information from all these poses, and by using what is known about the shape of human bodies, one can estimate the most likely shape underneath the clothing. Black: [0209] L.18-30.  Scanning a full body from multiple partial views requires that the subject stands still or that the system precisely registers deforming point clouds captured from a non-rigid and articulated body.  Bogo: sec. 1 para. 1 L.6-10).

Regarding claim 5, the combined teaching of Black and Bogo teaches a method according to claim 4, wherein the 3D representation of the at least one identified item is generated using at least some of the points discounted from the point cloud (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions.  Black: [0051] L.1-6.  For manual segmentation, the images are presented to the user on a display device and the user can either drag a rectangle over the region containing the body, or can click on a few points which are used to obtain a rough body model using the method described in Section 4 from which a tri-map is extracted as described in Section 2d. In either case this is used as input to guide an image based segmentation algorithm 1204, for example, based on graph cuts. In the case that the user is clothed, the image is segmented into three regions: skin, clothing/hair regions, and background. If the user is wearing tight-fitting clothing, then the image may be segmented into only foreground and background. Black: [0362]L.3-13.  Therefore, when skin region is considered, the clothing/hair regions are selected for not considered for further processing).

Regarding claim 6, the combined teaching of Black and Bogo teaches a method according to claim 4, wherein generating a 3D representation of the at least one identified item comprises: 
identifying a pre-determined 3D representation associated with the at least one identified item (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions.  Black: [0051] L.1-6.  Clothing and hair are the identified items besides the shape (skin) of the body); and 
selecting the pre-determined 3D representation for combination with the 3D reconstruction of the human subject (e.g., Finally, a means for body shape matching takes a body produced from some measurements (tailoring measures, images, range sensor data) and returns one or more "scores" indicating how similar it is in shape to another body or database of bodies. This matching means is used to rank body shape similarity to, for example, reorder a display of attributes associated with a database of bodies. Such attributes might be items for sale, information about preferred clothing sizes, images, textual information or advertisements. The display of these attributes presented to a user may be ordered so that the presented items are those corresponding to people with bodies most similar to theirs. The matching and ranking means can be used to make selective recommendations based on similar body shapes. The attributes (e.g. clothing size preference) of people with similar body shapes can be aggregated to recommend attributes to a user in a form of body-shape-sensitive collaborative filtering.  Black: [0027]).

Regarding claim 7, the combined teaching of Black and Bogo teaches a method according to claim 4, comprising rendering the combination of the 3D reconstruction of the human subject and the 3D representation associated with the at least one item, for display (e.g., The fitted model 111 is the output of the acquisition and fitting sub-system 100 depicted in FIG. 1. This model may be graphically presented on an output device (e.g. computer monitor, hand-held screen, television, etc.) in either static or animated form via a display and animation subsystem 204. It may be optionally clothed with virtual garments.  Black: [0062]).

Regarding claim 8, the combined teaching of Black and Bogo teaches a method according to claim 2, comprising estimating depth information for at least some of the human subject (e.g., For range data, segmentation is often simpler. If a part of the body is sufficiently far from the background, a simple threshold on depth can be sufficient. More generally the person cannot be assumed to be distant from the background (e.g. the feet touch the floor). In these situations a simple planar model of the background may be assumed and robustly fit to the sensor data. User input or a coarse segmentation can be used to remove much of the person. The remaining depth values are then fit by multiple planes (e.g. for the ground and a wall). Standard robust methods for fitting planes (e.g. RANSAC or M-estimation) can be used. Sensor noise can be modeled by fitting the deviations from the fitted plane(s); this can be done robustly by computing the median absolute deviation (MAD). The foreground then can be identified based on its deviation from the fitted plane(s). Black: [0072].  Depth information are obtained from range data with segmentation), subsequent to the discounting of the points corresponding to the one or more items identified as occluding the human subject (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions.  Black: [0051] L.1-6.  Clothing and hair are the identified items besides the shape (skin) of the body).

Regarding claim 9, the combined teaching of Black and Bogo teaches a method according to claim 1, wherein the 3D reconstruction of the human subject comprises an untextured mesh representation of the human subject, the method comprising: 
generating a texture map of the human subject, based on the regions in the at least one colour image identified as corresponding to the non-occluded parts of the human subject (e.g., In Section 7 a modification to the standard ICP cost function is described that allows clothing to be taken into account. Many range scanning devices simultaneously acquire visible imagery, which either provides a texture map or per-vertex coloration for the range data. This allows the classification of sensor data points as either skin or clothing using the skin classifier described in Section 2e (or more generally to classify each as corresponding to one of G classes using user input or skin/hair/clothing classifiers described in the literature (Section 7b)).  Black: [0373]); and 
applying the generated texture map to the 3D reconstruction of the human subject (e.g., The shape of the body can also be animated as a movie or displayed so as to show the changes in body shape over time. One method provides a graphical color coding of the body model to illustrate changes in body shape (e.g. due to weight loss). Since all model vertices are in correspondence, it is easy to measure the Euclidean distance between vertices of different models. This distance can be assigned a color from a range of colors that signify the type of change (e.g. increase or decrease in size as measured by vertex displacement along its surface normal). Color can alternatively be mapped to other shape attributes (such as curvature) computed from the mesh. The colors are then used to texture map the body model for display on a graphical device.  Black: [0380] L.8-21).

Regarding claim 10, the combined teaching of Black and Bogo teaches a method according to claim 9, wherein identifying the human subject and one or more items in the colour image comprises at least one of: 
i. inputting the colour image to a neural network trained to perform segmentation of images of human subjects; and
ii. performing cluster analysis on the at least one colour image (e.g., In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions. Previous methods for fitting body shape to images assumed that a static, known, background image is available to aid in segmentation of the foreground region. In general this is not possible with a small number of camera views or a moving sensor. A method is disclosed herein that enables accurate segmentation. Black: [0051]).

Regarding claim 11, the claim is a non-transitory computer readable medium claim of method claim 1.  The claim is similar in scope to claim 1 and it is rejected under similar rationale as claim 1.
Black further teaches that “The functions described herein may be embodied as computer implemented inventions in which software stored in a memory is executed by a processor to implement the respective functions.” (Black: [0492] L.1-4).

Regarding claim 12, the claim is a system claim of method claim 1.  The claim is similar in scope to claim 1 and it is rejected under similar rationale as claim 1.
Black further teaches that “A system for estimating a shape of a body of an individual, comprising: an input device operative to obtain input data including data representing said body in a plurality of poses; and at least one processor operative to execute at least one program out of at least one memory to fit a parametric body model of said body to the data representation of said body contained in the input data in said plurality of poses to generate multiple sets of pose parameters and at least one set of shape parameters, said at least one set of shape parameters being consistent with said plurality of poses.” (Black: Claim 75).

Regarding claim 13, the claim is a system claim of combination of method claims 2 and 5.  The claim is similar in scope to combination of claims 2 and 5 and it is rejected under similar rationale as combination of claims 2 and 5.

Regarding claim 14, the claim is a system claim of combination of method claims 4 and 7.  The claim is similar in scope to combination of claims 4 and 7 and it is rejected under similar rationale as combination of claims 4 and 7.

Regarding claims 15 and 16, the claim are system claims of method claims 9 and 6 respectively.  The claims are similar in scope to claims 9 and 6 respectively and they are rejected under similar rationale as claims 9 and 6 respectively.

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Black in view of Bogo as applied to claim 12 and further in view of Black137 (10,529,137).

Regarding claim 17, the combined teaching of Black and Bogo teaches a system according to claim 12, wherein the image processor is configured to input the at least one colour image to a neural network trained to segment pixels corresponding to a human subject from other pixels in images of human subjects (e.g., Following image capture as depicted at block 302, image processing is performed as illustrated at block 303. If this is not the case (for example with 8-bit pixels) then the input pixel values are rescaled to the range [0,1].  Black: [0086]. Standard calibration methods assume a black and white checkerboard pattern. While this assumption can be relaxed, it is easy to convert the multi-chromatic grid into a black-white one for processing by standard methods. To do so, the RGB pixel values are projected onto the line in color space between the colors of the grid (i.e. the line between blue and green in RGB).
Black: [0087]. In the case of a blue-green grid, the color at each pixel in the original image I is processed to generate a new gray-scale image 
    PNG
    media_image4.png
    46
    13
    media_image4.png
    Greyscale
. Pixels 
    PNG
    media_image5.png
    50
    97
    media_image5.png
    Greyscale
 are computed from pixels {ri, gi, bi}[Symbol font/0xCE]I as follows:  

    PNG
    media_image6.png
    120
    273
    media_image6.png
    Greyscale

This results in a grayscale image which is brighter in areas that have more green than blue, and darker in areas that have more blue than green. This allows the use of standard checkerboard detection algorithms (typically tuned for grayscale images) as described next.  Black: [0088]. Following image processing as illustrated at block 303, grid patch detection is performed as depicted at block 304 and described below. Pattern recognition is applied to this processed image I in order to detect patches of the grid pattern.  Black: [0089] L.1-5. The OpenCV library (Bradski and Kaehler, 2008) may be employed for the checkerboard detection function ("cvFindChessboardCorners"). This function returns an unordered set of grid points in image space where these points correspond to corners of adjacent quadrilaterals found in the image. Black: [0090] L.1-6. These image points on the patch must be put in correspondence with positions on the checkerboard in order to find a useful homography. Black: [0091] L.1-3. Once the homography for a patch is found, the image area corresponding to the patch is "erased" so that it will no longer be considered: specifically the convex hull of the points in the image space is computed, and all pixels lying inside that space are set to 0.5 (gray). Black: [0092]. Given the image regions defined by the convex hull of each patch, a model of the colors of the grids is computed 310 for image segmentation 311. Black: [0097] L.1-3. Segmentation is performed as depicted at block 311 to produce a segmented image 314 by thresholding Tmax. The threshold may be adjusted manually. This separates the image into a foreground region (below the threshold) and a background region (above the threshold). Black: [0100].  In the calibration and data pre-processing system 104, images and other sensor data is typically segmented into foreground regions and, for estimating shape under clothing, regions corresponding to skin, clothing and hair are detected. Even with many range sensors, there is an associated color image that can be used to detect skin or clothing regions.  Black: [0051] L.1-6.  Therefore, pixels in the image are segmented into skin, clothing and hair.  See 17_1 below).
While the combined teaching of Black and Bogo does not explicitly teach, Black137 teaches:
(17_1). the image processor is configured to input the at least one colour image to a neural network trained to segment pixels corresponding to a human subject from other pixels in images of human subjects (e.g., a CNN used for the pose detection model learns, for example, what pixel values in input image data correspond to particular human body poses (e.g., specific arrangements of limbs and torso) and encodes this information in the values of its convolutional filters such that it can provide an automated evaluation of pose shown in new input images. This may be accomplished in some embodiments by providing pixel values of an input image to the input nodes of a CNN and providing a labeled output image to the output nodes, with the labels indicating which pixels depict a human and which pixels depict non-human scenery. The labeled output can thus represent a segmented image, and the same CNN or another machine learning model can learn a particular pose associated with that segmented image. In other embodiments, the output nodes of the CNN can each correspond to a different pose in a multi-pose dataset, and training can include providing the pixel values of the input image to the input nodes of a CNN and indicating which output node corresponds to the depicted pose. The CNN can thus learn which pixel value patterns are associated with which poses. Black137: c.8 L.44-64).
It would have been obvious to a person of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Black137 into the combined teaching of Black and Bogo so more accurate detection of patterns with a neural network based on the specific training of the candidate patterns.

Conclusion
The prior arts made of record and not relied upon is considered pertinent to applicant's disclosure:
a).	Black630 (11,461,630) teaches that “Disclosed are systems and techniques for extracting user body shape (e.g., a representation of the three-dimensional body surface) from user behavioral data. The behavioral data may not be explicitly body-shape-related, and can include shopping history, social media likes, or other recorded behaviors of the user within (or outside of) a networked content delivery environment. The determined body shape can be used, for example, to generate a virtual fitting room user interface.” (Black630: Abstract) and “FIG. 1B depicts an example user interface 140 for displaying a body-shape-based recommendation according to the present disclosure. The user interface 140 includes information 160 regarding two clothing items 145, 150 that have been recommended for, or otherwise selected by or for, the user.” (Black630: c.5 L.1-6)
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SING-WAI WU whose telephone number is (571)270-5850. The examiner can normally be reached 9:00am - 5:30pm (Central Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SING-WAI WU/Primary Examiner, Art Unit 2611