DETAILED ACTIONS
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
The amendment filed June 29th, 2022 has been entered. Claims 1-15 and 17-21 remain pending in application. Applicant’s amendment to the Claims have overcome each and every 112(b) previously set forth in the Non-Final Office Action mailed March 31st, 2022.

Response to Arguments
Applicant’s arguments with respect to claims 1 and 11 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claims 1-7, 9, and 11-18 are rejected under 35 U.S.C. 103 as being unpatentable over Gu et al. (US 20140056471 A1), hereinafter referred to as Gu (previously cited in IDS), in view of Lee et al. (US 20130329011 A1), hereinafter referred to as Lee.

Regarding claim 1, Gu discloses a computer-implemented method for pose determination of a subject (see Fig. 16, “determine a gesture performed by the person”), comprising: 
identifying a plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) from current depth data (see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information) representing an environment (see para. 0099, “the image capture module may be pointed at the contents of a room”) based, at least in part, on a depth connectivity criterion (see para. 0058, “If the feature vector of the pixel has remained unchanged (within a predefined threshold range for intensity and depth to account for measurement errors), the pixel may be determined by background modeling module 230 to correspond to a background object. A background model may be created using the feature vector (D.sub.1, I.sub.1) of the pixel (pixel 1) that has remained unchanged for at least the threshold period of time”, the predefined threshold range for depth is the depth connectivity criterion); 
determining a first region (see Fig. 10B, para. 0116, pixel group 1010B-1) comprising a first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions  based, at least in part, on an estimation regarding a first pose component of the subject (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”); 
determining a second region (see Fig. 10B, para. 0116, pixel group 1010B-2) comprising a second subset (see Fig. 10 B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) based, at least in part, on relative locations of the first region and the second region (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2, para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user”, so based on distance, two pixel groups or regions can be determined to be related to each other or both be a part of the user or the person in the image); 
generating a collective region (see para. 0118, “a history of pixel groups from previous images may be used to determine if separate pixel groups should be treated as part of a single pixel group (referred to as a compound pixel group) because the pixels groups likely correspond to the same object”) by associating the first region with the second region (see para. 0118, “it may be determined that both pixel group 1010B-2 and pixel group 1010B-1 should be treated as a compound pixel group corresponding to the same pixel group because these pixel groups were previously determined to be part of a single pixel group (e.g., pixel group 1010A of FIG. 10A)”), wherein the first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) and the second subset (see Fig. 10B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) are disconnected from one another (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2 are disconnected from each other); 
identifying the first pose component (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) and a second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand” which is the same as the first pose component defined in the Specification of the application in para. 0010, “the at least one second pose component is a hand of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) of the subject from the collective region (see para. 0119, “Following the size threshold analysis, only pixel group 1010A or pixel groups 1010B-1 and 1010B-2, which may be treated as a compound pixel group, may remain for analysis”, so pixel group 1010B-1 and 1010B-2 is considered as one compound pixel group or collective region, para. 0120, “a principal component analysis (PCA) may be conducted. A PCA may involve the use of a set of training observations to determine if a pixel group likely corresponds to a person”, para. 0169 – 0170, “At step 1630, an indication of each pixel determined to correspond to a person may be output. Each pixel that is part of a pixel group that was determined to have a head and shoulders at step 1625 may be output at step 1630.”); 
determining a spatial relationship (see para. 0118, “distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user. A pixel group directly in front of a pixel group associated with a user may be considered likely to represent part of the user” ) between the identified first pose component (see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the identified second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”); and 
generating a controlling command (see para. 0002, “A hand movement or movement of another part of the person's body can be detected by an electronic device and used to determine a command to be executed by the device (e.g., provided to an interface being executed by the device) or to be output to an external device”, “para. 0003, “Gestures may be useful to control devices”) based, at least in part, on the determined spatial relationship (see para. 0118, the spatial relationship is used to determine the compound pixel group which is used to detect the gesture performed or hand movement performed as shown in Fig. 16 so it is used in part to generate a controlling command).

Gu does not expressly disclose the estimation regarding the first pose component of the subject including: determining a centroid point of a pose component corresponding to the first pose component and detected in prior depth data; and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data and determine one of the plurality of candidate regions in the current depth data as the first region.
	However, Lee teaches the estimation regarding the first pose component of the subject (para. 0093, “FIG. 6 depicts a method for detecting a pose of articulated body portions”, body portion reads on the first pose component, para. 0004, “the ability of the imaging system to accurately identify articulated body portions, a model of the articulated body portions is provided”, para. 0061, “at least one processor to perform a method for modeling a pose of a hand or other articulated body portion of a user as described herein”) including: determining a centroid point of a pose component (para. 0086, “A representative attract point can be a point that represents a body portion of the model. The term "attract point" indicates that in a matching process, the attract point is moved toward, or attracted to, depth sensor data. In one approach, the representative attract point is at a central point, or centroid, of the body portion. The centroid can be within the body portion or at a surface of the body portion. The surface can face the depth camera, along the depth axis or along a line of sight to the depth camera. The centroid can be considered to be a central point of the body portion”) corresponding to the first pose component and detected in prior depth data (para. 0005, “The method further includes accessing a model. The model includes articulated body portions which correspond to the articulated body portions of the object, and which each have at least one representative attract point.”, the model is the prior depth data and the articulated body portions reads on the first pose component); and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data (para. 0006, “The method further includes matching the representative attract points to the centroids and performing a rigid transform of the model, e.g., without changing relative orientations of the articulated portions of the model, to match the model to the depth pixels of the depth sensor.”) and determine one of the plurality of candidate regions in the current depth data as the first region (para. 0007, “Different pixels of the sensor data can be associated with different body portions using an exemplar machine learning process. In this approach, each depth pixel of the sensor data is assigned a probability for each body portion, indicating a probability that the depth pixel is part of the body portion. A depth pixel can be associated with a body portion for which the probability is the highest among all body portions.”).
Gu and Lee are both considered to be analogous to the claimed invention because they are in the same field of gesture or pose determination. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by Gu to incorporate the teachings of Lee of the estimation regarding the first pose component of the subject including: determining a centroid point of a pose component corresponding to the first pose component and detected in prior depth data; and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data and determine one of the plurality of candidate regions in the current depth data as the first region. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to enhance the ability of the imaging system to accurately identify articulated body potions (Lee, para. 0004).
Regarding claim 2, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein the current depth data is obtained from images (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”) captured by a stereo camera (Gu, see para. 0048, “the image capture module 110 may be stereoscopic”), wherein the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”)  includes a depth map (Gu, see Fig. 4, a point cloud of a scene captured by an image capture module) calculated based on a disparity map or intrinsic parameters (Gu, para. 0074, “FIG. 4 illustrates an embodiment of a point cloud 400 of the scene captured by the image capture module. Point cloud 400 illustrates each pixel of image 300 based on each pixel's depth value. As such, point cloud 400 is a three-dimensional representation of the pixels of image 300”) of the stereo camera (Gu, see para. 0048, “the image capture module 110 may be stereoscopic”).

Regarding claim 3, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), further comprising determining a depth range (Gu, see para. 0064 and Eq. 1, R represents the maximum depth range of depth values acquired by image acquisition module 210), where the subject is likely to appear in the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”), wherein the plurality of candidate regions (Gu, see Fig. 10B for the plurality of candidate regions or pixel groups) are identified based on the depth range (Gu, see para. 0064, “When a pixel is determined to be occupied by a person at a particular depth, the depth may receive a "vote" in the pixel's array at the element corresponding to the depth.”).

Regarding claim 4, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”) includes at least one of unknown, invalid, or inaccurate depth information (Gu, see Fig. 10B, the space between pixel group 1010B-1 and pixel group 1010B-2 is unknown depth information, “FIG. 10B illustrates an embodiment of a depth segmented image wherein a person's hand occludes at least a portion of the person's arm”, so the occluded portion of the person’s arm in the depth has unknown depth information).
Regarding claim 5, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein the depth connectivity criterion includes at least one of a depth threshold or a change-of-depth threshold (Gu, see para. 0058, “If the feature vector of the pixel has remained unchanged (within a predefined threshold range for intensity and depth to account for measurement errors), the pixel may be determined by background modeling module 230 to correspond to a background object. A background model may be created using the feature vector (D.sub.1, I.sub.1) of the pixel (pixel 1) that has remained unchanged for at least the threshold period of time”, the predefined threshold range for depth is the depth connectivity criterion).
Regarding claim 6, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein the estimation regarding the first pose component of the subject further includes obtaining (Gu, see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, para. 0120, “a principal component analysis (PCA) may be conducted. A PCA may involve the use of a set of training observations to determine if a pixel group likely corresponds to a person”, para. 0169 – 0170, “At step 1630, an indication of each pixel determined to correspond to a person may be output. Each pixel that is part of a pixel group that was determined to have a head and shoulders at step 1625 may be output at step 1630.”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) for determining the first region (Gu, see Fig. 10B, para. 0116, pixel group 1010B-1) is based on baseline information from the prior depth data (Gu, see para. 0120, “A PCA may involve the use of a set of training observations to determine if a pixel group likely corresponds to a person. Previously, a large number (e.g., tens, hundreds, thousands, etc.) of images of people's upper bodies may be captured. Each such sample may be converted into a binary silhouette, and normalized in a fixed direction.”, the sample images were the baseline information, Lee also teaches matching the current depth data to a model depth data which corresponds to the prior depth data to estimate the pose of the body portion or the region), wherein the baseline information includes at least one of an estimated size or an estimated location of the first or second pose component (Gu, see para. 0120, “These samples may include samples in which the upper body (e.g., head and shoulders) of the persons are rotated along the x-axis, y-axis, and/or z-axis”, Lee teaches a model depth data which includes the articulated body portions).

Regarding claim 7, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein determining the second region (Gu, see Fig. 10B, para. 0116, pixel group 1010B-2) is based, at least in part, on non-depth information representing the environment (Gu, see para. 0071, the image 300 of a scene captured by an image capture module may include depth and intensity data, Fig. 3 shows the two-dimensional representation of image 300 which only shows the intensity data, since it is the input data to the method, it is used in part to determine the region), wherein the non-depth information includes two-dimensional image data that corresponds to the current depth data (Gu, see para. 0071 and Fig. 3, two-dimensional representation of image 300 (as illustrated) only the intensity data is illustrated) .

Regarding claim 9, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein determining the spatial relationship (Gu, see para. 0118, “distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user. A pixel group directly in front of a pixel group associated with a user may be considered likely to represent part of the user” ) between the identified first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the identified second pose component (Gu, see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”) comprises: 
determining one or more geometric attributes (Gu, see para. 0118, “Determining two or more pixel groups should be treated as a compound pixel group may be based on location, size, shape and/or movement of the pixel groups. Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”) of at least one of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) or the second pose component (Gu, see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”), wherein the one or more geometric attributes include at least one of a centroid location, contour, or shape (Gu, see para. 0118, “Determining two or more pixel groups should be treated as a compound pixel group may be based on location, size, shape and/or movement of the pixel groups. Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”, shape is mentioned as one of the basing methods to determine the compound pixel group) of the at least one of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) or the second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”); or 
determining one or more vectors (Gu, see para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”, distance corresponds to the vector between the first pixel group and second pixel group which defines the first pose component and second pose component) pointing between portions of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the second pose component (Gu, see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”).

Regarding claim 11, Gu discloses a movable object (see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), comprising: 
a controller programmed to control the movable object (see para. 0199, “device driver”, a device driver is a computer program that operates or controls a particular type of device that is attached to a computer or automation), wherein the controller (see para. 0199, “device driver”) includes one or more processors (see Fig. 19, processors) configured to: 
identify a plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) from current depth data (see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information) representing an environment (see para. 0099, “the image capture module may be pointed at the contents of a room”) based, at least in part, on a depth connectivity criterion (see para. 0058, “If the feature vector of the pixel has remained unchanged (within a predefined threshold range for intensity and depth to account for measurement errors), the pixel may be determined by background modeling module 230 to correspond to a background object. A background model may be created using the feature vector (D.sub.1, I.sub.1) of the pixel (pixel 1) that has remained unchanged for at least the threshold period of time”, the predefined threshold range for depth is the depth connectivity criterion); 
determine a first region (see Fig. 10B, para. 0116, pixel group 1010B-1) comprising a first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions  based, at least in part, on an estimation regarding a first pose component of the subject (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”);
 determine a second region (see Fig. 10B, para. 0116, pixel group 1010B-2) comprising a second subset (see Fig. 10 B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) based, at least in part, on relative locations of the first region and the second region (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2, para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user”, so based on distance, two pixel groups or regions can be determined to be related to each other or both be a part of the user or the person in the image); 
generate a collective region (see para. 0118, “a history of pixel groups from previous images may be used to determine if separate pixel groups should be treated as part of a single pixel group (referred to as a compound pixel group) because the pixels groups likely correspond to the same object”) by associating the first region with the second region (see para. 0118, “it may be determined that both pixel group 1010B-2 and pixel group 1010B-1 should be treated as a compound pixel group corresponding to the same pixel group because these pixel groups were previously determined to be part of a single pixel group (e.g., pixel group 1010A of FIG. 10A)”), wherein the first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) and the second subset (see Fig. 10B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) are disconnected from one another (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2 are disconnected from each other); 
identify the first pose component (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) and a second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand” which is the same as the first pose component defined in the Specification of the application in para. 0010, “the at least one second pose component is a hand of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) of the subject from the collective region (see para. 0119, “Following the size threshold analysis, only pixel group 1010A or pixel groups 1010B-1 and 1010B-2, which may be treated as a compound pixel group, may remain for analysis”, so pixel group 1010B-1 and 1010B-2 is considered as one compound pixel group or collective region, para. 0120, “a principal component analysis (PCA) may be conducted. A PCA may involve the use of a set of training observations to determine if a pixel group likely corresponds to a person”, para. 0169 – 0170, “At step 1630, an indication of each pixel determined to correspond to a person may be output. Each pixel that is part of a pixel group that was determined to have a head and shoulders at step 1625 may be output at step 1630.”); 
determine a spatial relationship (see para. 0118, “distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user. A pixel group directly in front of a pixel group associated with a user may be considered likely to represent part of the user” ) between the identified first pose component (see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the identified second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”); and 
generate a controlling command (see para. 0002, “A hand movement or movement of another part of the person's body can be detected by an electronic device and used to determine a command to be executed by the device (e.g., provided to an interface being executed by the device) or to be output to an external device”, “para. 0003, “Gestures may be useful to control devices”) based, at least in part, on the determined spatial relationship (see para. 0118, the spatial relationship is used to determine the compound pixel group which is used to detect the gesture performed or hand movement performed as shown in Fig. 16 so it is used in part to generate a controlling command).

Gu does not expressly disclose the estimation regarding the first pose component of the subject including: determining a centroid point of a pose component corresponding to the first pose component and detected in prior depth data; and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data and determine one of the plurality of candidate regions in the current depth data as the first region.
	However, Lee teaches the estimation regarding the first pose component of the subject (para. 0093, “FIG. 6 depicts a method for detecting a pose of articulated body portions”, body portion reads on the first pose component, para. 0004, “the ability of the imaging system to accurately identify articulated body portions, a model of the articulated body portions is provided”, para. 0061, “at least one processor to perform a method for modeling a pose of a hand or other articulated body portion of a user as described herein”) including: determining a centroid point of a pose component (para. 0086, “A representative attract point can be a point that represents a body portion of the model. The term "attract point" indicates that in a matching process, the attract point is moved toward, or attracted to, depth sensor data. In one approach, the representative attract point is at a central point, or centroid, of the body portion. The centroid can be within the body portion or at a surface of the body portion. The surface can face the depth camera, along the depth axis or along a line of sight to the depth camera. The centroid can be considered to be a central point of the body portion”) corresponding to the first pose component and detected in prior depth data (para. 0005, “The method further includes accessing a model. The model includes articulated body portions which correspond to the articulated body portions of the object, and which each have at least one representative attract point.”, the model is the prior depth data and the articulated body portions reads on the first pose component); and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data (para. 0006, “The method further includes matching the representative attract points to the centroids and performing a rigid transform of the model, e.g., without changing relative orientations of the articulated portions of the model, to match the model to the depth pixels of the depth sensor.”) and determine one of the plurality of candidate regions in the current depth data as the first region (para. 0007, “Different pixels of the sensor data can be associated with different body portions using an exemplar machine learning process. In this approach, each depth pixel of the sensor data is assigned a probability for each body portion, indicating a probability that the depth pixel is part of the body portion. A depth pixel can be associated with a body portion for which the probability is the highest among all body portions.”).
Gu and Lee are both considered to be analogous to the claimed invention because they are in the same field of gesture or pose determination. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the movable object as taught by Gu to incorporate the teachings of Lee of the estimation regarding the first pose component of the subject including: determining a centroid point of a pose component corresponding to the first pose component and detected in prior depth data; and mapping the centroid point to the current depth data to estimate the first pose component in the current depth data and determine one of the plurality of candidate regions in the current depth data as the first region. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to enhance the ability of the imaging system to accurately identify articulated body potions (Lee, para. 0004).

Regarding claim 12, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), further comprising: 
a stereo camera (Gu, see para. 0048, “the image capture module 110 may be stereoscopic”), wherein the current depth data is obtained from images (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”) captured by the stereo camera (Gu, see para. 0048, “the image capture module 110 may be stereoscopic”), wherein the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”)  includes a depth map (see Fig. 4, a point cloud of a scene captured by an image capture module) calculated based on a disparity map or intrinsic parameters (Gu, para. 0074, “FIG. 4 illustrates an embodiment of a point cloud 400 of the scene captured by the image capture module. Point cloud 400 illustrates each pixel of image 300 based on each pixel's depth value. As such, point cloud 400 is a three-dimensional representation of the pixels of image 300”) of the stereo camera (Gu, see para. 0048, “the image capture module 110 may be stereoscopic”).

Regarding claim 13, the combination of Gu in view Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), wherein the one or more processors (Gu, see Fig. 19, processors) are further configured to determine a depth range (Gu, see para. 0064 and Eq. 1, R represents the maximum depth range of depth values acquired by image acquisition module 210), where the subject is likely to appear in the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”), wherein the plurality of candidate regions (Gu, see Fig. 10B for the plurality of candidate regions or pixel groups) are identified based on the depth range (Gu, see para. 0064, “When a pixel is determined to be occupied by a person at a particular depth, the depth may receive a "vote" in the pixel's array at the element corresponding to the depth.”).

Regarding claim 14, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), wherein the current depth data (Gu, see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information, para. 0099, “each image may be acquired by image acquisition module 210 from image capture module 110”) includes at least one of unknown, invalid, or inaccurate depth information (Gu, see Fig. 10B, the space between pixel group 1010B-1 and pixel group 1010B-2 is unknown depth information, “FIG. 10B illustrates an embodiment of a depth segmented image wherein a person's hand occludes at least a portion of the person's arm”, so the occluded portion of the person’s arm in the depth has unknown depth information).

Regarding claim 15, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), wherein the depth connectivity criterion includes at least one of a depth threshold or a change-of-depth threshold (Gu, see para. 0058, “If the feature vector of the pixel has remained unchanged (within a predefined threshold range for intensity and depth to account for measurement errors), the pixel may be determined by background modeling module 230 to correspond to a background object. A background model may be created using the feature vector (D.sub.1, I.sub.1) of the pixel (pixel 1) that has remained unchanged for at least the threshold period of time”, the predefined threshold range for depth is the depth connectivity criterion).

Regarding claim 17, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), wherein determining the second region (Gu, see Fig. 10B, para. 0116, pixel group 1010B-2) is based, at least in part, on non-depth information representing the environment or estimated location of a plurality of joints of the subject (Gu, see para. 0071, the image 300 of a scene captured by an image capture module may include depth and intensity data, Fig. 3 shows the two-dimensional representation of image 300 which only shows the intensity data which is a non-depth information of the scene, since it is the input data to the method, it is used in part to determine the region), wherein the non-depth information includes two-dimensional image data that corresponds to the current depth data (Gu, see para. 0071 and Fig. 3, two-dimensional representation of image 300 (as illustrated) only the intensity data is illustrated).

Regarding claim 18, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”), wherein to determine the spatial relationship between the identified first pose component and the identified second pose component, the one or more processors are further configured to: 
determine one or more geometric attributes (Gu, see para. 0118, “Determining two or more pixel groups should be treated as a compound pixel group may be based on location, size, shape and/or movement of the pixel groups. Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”) of at least one of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) or the second pose component (Gu, see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”), wherein the one or more geometric attributes include at least one of a centroid location, contour, or shape (Gu, see para. 0118, “Determining two or more pixel groups should be treated as a compound pixel group may be based on location, size, shape and/or movement of the pixel groups. Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”, shape is mentioned as one of the basing methods to determine the compound pixel group) of the at least one of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) or the second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”); or 
determine one or more vectors (Gu, see para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group.”, distance corresponds to the vector between the first pixel group and second pixel group which defines the first pose component and second pose component) pointing between portions of the first pose component (Gu, see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the second pose component (Gu, see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”).

Claims 8 is rejected under 35 U.S.C. 103 as being unpatentable over Gu in view of Lee and in further view of Reville et al. (US 20110289456 A1), hereinafter referred to as Reville (previously cited in IDS).

Regarding claim 8, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”).
The combination of Gu in view of Lee does not explicitly disclose wherein determining the second region is based on estimated locations of a plurality of joints of the subject.
	However, Reville discloses wherein determining the second region (Reville, see para. 0005, “The method determines from the skeletal mapping whether movement including a first hand of the human target satisfies one or more filters for a first mid-air gesture and whether a second hand of the human target satisfies one or more filters for a modifier of the first mid-air gesture”, the skeletal mapping is used to determine the gesture of first hand of a human, both the application and Gu defines using the second region to determine the second pose component which is the person’s hand) is based on estimated locations of a plurality of joints of the subject (Reville, see Fig. 5 and para. 0081, “FIG. 5 illustrates an example of a skeletal model or mapping 530 representing a scanned human target that may be generated at step 510 of FIG. 4. According to one embodiment, the skeletal model 530 may include one or more data structures that may represent a human target as a three-dimensional model. Each body part may be characterized as a mathematical vector defining joints and bones of the skeletal model 530”).
Gu and Reville are both considered to be analogous to the claimed invention because they are in the same field of gesture or pose determination. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Gu in view of Lee to incorporate the teachings of Reville wherein determining the second region is based on estimated locations of a plurality of joints of the subject. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because the joints in the skeletal mapping may enable one or more body parts defined there between to move relative to one or more other body parts (Reville, para. 0082).

Claims 10 and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Gu in view Lee and in further view of Yang et al. (US 20170351253 A1), hereinafter referred to as Yang (previously cited in IDS).

Regarding claim 10, the combination of Gu in view of Lee discloses the method of claim 1 (Gu, see Fig. 16, “determine a gesture performed by the person”).

The combination of Gu in view of Lee does not explicitly disclose further comprising controlling a mobile platform based on the controlling command, wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head- mounted display.
	However, Yang discloses further comprising controlling  (Yang, Abstract, “determining control data of unmanned aerial vehicle based on the change in the body portion) a mobile platform (Yang, see Abstract, “unmanned aerial vehicle”) based on the controlling command (Yang, see Abstract and Fig. 4A, the method determines a change in a body portion of a user which corresponds to the pose determination of the application and gesture determination of Gu, and that is used to determine the control data of the unmanned aerial vehicle which corresponds to the controlling command), wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head- mounted display (Yang, see Abstract, the mobile platform being controlled is an unmanned aerial vehicle or UAV).
Gu and Yang are both considered to be analogous to the claimed invention because they are in the same field of controlling a device. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Gu in view of Lee to incorporate the teachings of Yang further comprising controlling a mobile platform based on the controlling command, wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head- mounted display. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because by using the change in body posture or position into UAV control, a good control effect is obtained (Yang, para. 0040).

Regarding claim 19, the combination of Gu in view of Lee discloses the movable object of claim 11 (Gu, see Fig. 19, “FIG. 19 provides a schematic illustration of one embodiment of a computer system 1900 that can perform the methods provided by various other embodiments, as described herein, and/or can function as components of system 100, system 200, and/or system 1400”).
The combination of Gu in view of Lee does not explicitly disclose wherein the one or more processors are further configured to control the mobile platform based on the controlling command, wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head-mounted display.
	However, Yang discloses wherein the one or more processors (Yang, see Fig. 2, processor 240) are further configured to control (Yang, Abstract, “determining control data of unmanned aerial vehicle based on the change in the body portion) the mobile platform (Yang, see Abstract, “unmanned aerial vehicle”) based on the controlling command (Yang, see Abstract and Fig. 4A, the method determines a change in a body portion of a user which corresponds to the pose determination of the application and gesture determination of Gu, and that is used to determine the control data of the unmanned aerial vehicle which corresponds to the controlling command), wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head- mounted display (Yang, see Abstract, the mobile platform being controlled is an unmanned aerial vehicle or UAV).
Gu and Yang are both considered to be analogous to the claimed invention because they are in the same field of controlling a device. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the movable object as taught by the combination of Gu in view of Lee to incorporate the teachings of Yang wherein the one or more processors are further configured to control the mobile platform based on the controlling command, wherein the mobile platform includes at least one of an unmanned aerial vehicle (UAV), a manned aircraft, an autonomous car, a self-balancing vehicle, a robot, a smart wearable device, a virtual reality (VR) head-mounted display, or an augmented reality (AR) head-mounted display. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because by using the change in body posture or position into UAV control, a good control effect is obtained (Yang, para. 0040).

Regarding claim 20, the combination of Gu in view of Lee discloses a non-transitory computer-readable medium storing computer-executable instructions (see para. 0197, one or more non-transitory storage devices 1925) that, when executed, cause one or more processors (see Fig. 19, processors) to perform actions, the actions comprising: 
identifying a plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) from depth data (see Fig. 16, step 1605, “receive an image having at least some pixels designated as background”, para. 0164, “image of the scene received at step 1605 may be the image output from method 800 of FIG. 8 at step 830, Fig. 8, step 810, acquire image having depth information) representing an environment (see para. 0099, “the image capture module may be pointed at the contents of a room”) based, at least in part, on a depth connectivity criterion (see para. 0058, “If the feature vector of the pixel has remained unchanged (within a predefined threshold range for intensity and depth to account for measurement errors), the pixel may be determined by background modeling module 230 to correspond to a background object. A background model may be created using the feature vector (D.sub.1, I.sub.1) of the pixel (pixel 1) that has remained unchanged for at least the threshold period of time”, the predefined threshold range for depth is the depth connectivity criterion); 
determining a first region (see Fig. 10B, para. 0116, pixel group 1010B-1) comprising a first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions  based, at least in part, on an estimation regarding a first pose component of the subject (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”); 
determining a second region (see Fig. 10B, para. 0116, pixel group 1010B-2) comprising a second subset (see Fig. 10 B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) based, at least in part, on relative locations of the first region and the second region (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2, para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user”, so based on distance, two pixel groups or regions can be determined to be related to each other or both be a part of the user or the person in the image); 
generating a collective region (see para. 0118, “a history of pixel groups from previous images may be used to determine if separate pixel groups should be treated as part of a single pixel group (referred to as a compound pixel group) because the pixels groups likely correspond to the same object”) by associating the first region with the second region (see para. 0118, “it may be determined that both pixel group 1010B-2 and pixel group 1010B-1 should be treated as a compound pixel group corresponding to the same pixel group because these pixel groups were previously determined to be part of a single pixel group (e.g., pixel group 1010A of FIG. 10A)”), wherein the first subset (see Fig. 10 B, pixel group 1010B-1) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) and the second subset (see Fig. 10B, pixel group 1010B-2) of the plurality of candidate regions (see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) are disconnected from one another (see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2 are disconnected from each other); 
identifying the first pose component (see Fig. 10 B, para. 0118, pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso which is the same as the first pose component defined in the Specification of the application in para. 0010, “the first pose component is a torso of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) and a second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand” which is the same as the first pose component defined in the Specification of the application in para. 0010, “the at least one second pose component is a hand of the subject”, Fig. 16 and para. 0170, step 1635, “At step 1635, for each group of pixels that was determined to correspond to at least one person, a plane may be defined. For each group of pixels, a plane may be positioned and oriented to minimize the fitting error between some or all of the pixels of the group of pixels and the plane. Ideally, this plane may be aligned with the torso, shoulders, and head of the pixels corresponding to the person”) of the subject from the collective region (see para. 0119, “Following the size threshold analysis, only pixel group 1010A or pixel groups 1010B-1 and 1010B-2, which may be treated as a compound pixel group, may remain for analysis”, so pixel group 1010B-1 and 1010B-2 is considered as one compound pixel group or collective region, para. 0120, “a principal component analysis (PCA) may be conducted. A PCA may involve the use of a set of training observations to determine if a pixel group likely corresponds to a person”, para. 0169 – 0170, “At step 1630, an indication of each pixel determined to correspond to a person may be output. Each pixel that is part of a pixel group that was determined to have a head and shoulders at step 1625 may be output at step 1630.”); 
determining a spatial relationship (see para. 0118, “distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user. A pixel group directly in front of a pixel group associated with a user may be considered likely to represent part of the user” ) between the identified first pose component (see para. 0118, “pixel group 1010B-1, which corresponds to the person's shoulder's head, and torso”) and the identified second pose component (see para. 0118, “pixel group 1010B-2 corresponds to a person's hand”); and 
generating a controlling command (see para. 0002, “A hand movement or movement of another part of the person's body can be detected by an electronic device and used to determine a command to be executed by the device (e.g., provided to an interface being executed by the device) or to be output to an external device”, “para. 0003, “Gestures may be useful to control devices”) based, at least in part, on the determined spatial relationship (see para. 0118, the spatial relationship is used to determine the compound pixel group which is used to detect the gesture performed or hand movement performed as shown in Fig. 16 so it is used in part to generate a controlling command).

Gu does not explicitly disclose one or more processors associated with a mobile platform.
However, Yang discloses one or more processors (Yang, see Fig. 2, processor 240)  associated with a mobile platform (Yang, Abstract, “determining control data of unmanned aerial vehicle based on the change in the body portion, the mobile platform is the unmanned aerial vehicle).
Gu and Yang are both considered to be analogous to the claimed invention because they are in the same field of controlling a device. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the non-transitory computer-readable medium as taught by Gu to incorporate the teachings of Yang of one or more processors associated with a mobile platform. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been because by using the change in body posture or position into UAV control, a good control effect is obtained (Yang, para. 0040).
Claims 21 is rejected under 35 U.S.C. 103 as being unpatentable over Gu in view Lee and in further view of Leyvand et al. (US 20110081045 A1), hereinafter referred to as Leyvand.

Regarding claim 21, the combination of Gu in view of Lee discloses the method of claim 1  (Gu, see Fig. 16, “determine a gesture performed by the person”), wherein determining the second region (Gu, see Fig. 10B, para. 0116, pixel group 1010B-2) comprising the second subset (Gu, see Fig. 10 B, pixel group 1010B-2) of the plurality of candidate regions (Gu, see Fig. 16, step 1610, “perform depth segmentation to create one or more pixel groups”, para. 0116, FIG. 10B shows different pixel groups in the depth segmented image) based, at least in part, on relative locations of the first region and the second region (Gu, see Fig. 10B, pixel group 1010B-1 and pixel group 1010B-2, para. 0118, “Distance may also be used to determine if two or more pixel groups should be treated as a compound pixel group. For example, a second pixel group close to a first pixel group of a user may be likely to be part of the user”, so based on distance, two pixel groups or regions can be determined to be related to each other or both be a part of the user or the person in the image).

The combination of Gu in view of Lee does not expressly disclose generating a grid system for the current depth data, one or more gridlines of the grid system being defined based on baseline information regarding the first pose component of the subject, and the baseline information regarding the first pose component including at least one of the centroid point or a size of the pose component corresponding to the first pose component and detected in the prior depth data; and determining the second region from the plurality of candidate regions based on the grid system.
	However, Leyvand teaches generating a grid system for the current depth data (para. 0069, “the grid of one or more voxels may be generated at 310 by projecting, for example, information such as the depth values, X-values, Y-values, or the like into three-dimensional (3-D) space. For example, depth values may be mapped to 3-D points in the 3-D space using a transformation such as a camera, image, or perspective transform such that the information may be transformed as trapezoidal or pyramidal shapes in the 3-D space. In one embodiment, the 3-D space having the trapezoidal or pyramidal shapes may be divided into blocks such as cubes that may create a grid of voxels such that each of the blocks or cubes may represent a voxel in the grid. For example, the target recognition, analysis, and tracking system may superimpose a 3-D grid over the 3-D points that correspond to the object in the depth image. The target recognition, analysis, and tracking system may then divide or chop up the grid into the blocks representing voxels to downsample the depth image into a lower resolution. According to an example embodiment, each of the voxels in the grid may include an average depth value of the valid or non-zero depth values for the pixels associated with the 3-D space in the grid. This may allow the voxel to represent a minimum and/or maximum depth value of the pixels associated with the 3-D space in the grid; an average of the X-values and Y-values for pixels having a valid depth value associated with the 3-D space; or any other suitable information provided by the depth image”), one or more gridlines of the grid system being defined based on baseline information regarding the first pose component of the subject (para. 0069, “information such as the depth values, X-values, Y-values, or the like into three-dimensional (3-D) space”, para. 0073, “calculate the average position of the voxels associated with the human target based on X-values, Y-values, and depth values associated with the voxels. For example, as described above, the target recognition, analysis, and tracking system may calculate an X-value for a voxel by averaging the X-values of the pixels associated with the voxel, a Y-value for the voxel by averaging the Y-values of the pixels associated with the voxel, and a depth value for the voxel by averaging the depth values of the pixels associated with the voxel. At 320, the target recognition, analysis, and tracking system may average the X-values, the Y-values, and the depth values of the voxels included in the human target to calculate the average position that may provide the estimate of the centroid or center of the human target”, the X-values, Y-values, and depth values define the centroid which are used to generate the grid), and the baseline information regarding the first pose component including at least one of the centroid point or a size of the pose component (para. 0069, “information such as the depth values, X-values, Y-values, or the like into three-dimensional (3-D) space”, para. 0073, “At 320, the target recognition, analysis, and tracking system may average the X-values, the Y-values, and the depth values of the voxels included in the human target to calculate the average position that may provide the estimate of the centroid or center of the human target”, the X-values, Y-values, and depth values define the centroid which are used to generate the grid) corresponding to the first pose component and detected in the prior depth data (Lee teaches in para. 0086, of detecting the centroid point in the model which is the prior depth data and using that to match with the current depth data to identify the body portions); and determining the second region from the plurality of candidate regions based on the grid system (Fig. 5, para. 0074, “tracking system may then determine a head of the human target at 320. For example, in one embodiment, the target recognition, analysis, and tracking system may determine a position or location of the head by searching for various candidates at positions or locations suitable for the head”, the second region is determined by using the grid voxels which is determined in step 310 explained in para. 0065).
Gu and Leyvand are both considered to be analogous to the claimed invention because they are in the same field of gesture or pose determination. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method as taught by the combination of Gu in view of Lee of generating a grid system for the current depth data, one or more gridlines of the grid system being defined based on baseline information regarding the first pose component of the subject, and the baseline information regarding the first pose component including at least one of the centroid point or a size of the pose component corresponding to the first pose component and detected in the prior depth data; and determining the second region from the plurality of candidate regions based on the grid system. Such a modification is the result of combining prior art elements according to known methods to yield predictable results. The motivation for the proposed modification would have been to help divide the space to determine the foreground object and background object (Leyvand, para. 0071).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DENISE G ALFONSO whose telephone number is (571)272-1360. The examiner can normally be reached Monday - Friday 7:30 - 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Claire Wang can be reached on 571-270-1051. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/DENISE G ALFONSO/Examiner, Art Unit 2663                                                                                                                                                                                                        
/CLAIRE X WANG/Supervisory Patent Examiner, Art Unit 2663