Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claims 1, 3, 19, 20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 18, 20 of copending Application No. 17249241 (reference application). Although the claims at issue are not identical, they are not patentably distinct from each other.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
The table below helps illustrate the double patenting rejection:
Claims 1 & 3 of Instance Application 17249239
Claim 1 of Co-pending App. 17249241
 1) A method comprising: receiving, by one or more processors, a monocular image that includes a depiction of a whole body of a user; generating, by the one or more processors, a segmentation of the whole body of the user based on the monocular image; 


accessing a video feed comprising a plurality of monocular images received prior to the monocular image; 

smoothing, using the video feed, the segmentation of the whole body generated based on the monocular image to provide a smoothed segmentation; 

and applying one or more visual effects to the monocular image based on the smoothed segmentation
1) A method comprising: receiving, by one or more processors, a monocular image that includes a depiction of a whole body of a user; generating, by the one or more processors, a segmentation of the whole body of the user based on the monocular image by applying one or more machine learning techniques; 





receiving input that selects a visualization mode; 



and applying one or more visual effects corresponding to the visualization mode to the monocular image based on the segmentation
3) The method of claim 1, wherein the monocular image is a first frame of a video, further comprising: generating the segmentation using a first machine learning technique, wherein smoothing the segmentation of the whole body comprises comparing the generated segmentation with a previous segmentation generated by the first machine learning technique from the plurality of monocular images



As seen from above, the only difference between claim 1 of the instance application and the claim 1 of the co-pending application is wherein segmentation is applied using machine learning techniques and the selection of a visualization mode.  First, a smoothing of the segmentation is interpreted to be a type of visualization applied to the segmentation, and is taught by claim 1 of the instance application.  Furthermore, it is obvious to one of ordinary skill in the art that claim 3 of the instance application explicitly teaches segmentation being applied using machine learning, a very common implementation of machine learning in the current state of the art.  Thus claim 1 of the co-pending application is obvious in view of claims 1 and 3 of the instance application.  Claims 19 and 20 of the instance application claim limitations in scope to claim 1 as a system and medium and are similarly rejected as being obvious in view of the co-pending application.
	  Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1-4, 10, 13-16, 18-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322).
Re claim 1, Holzer teaches a method comprising: 
receiving, by one or more processors, an image that includes a depiction of a whole body of a user (see abstract, live images of a person captured from a hand-held camera, using the image data of the live images, a skeleton of the person and a boundary between the person and the background), (see Fig. 10, in reference to [0149], image data including a person),  and (see [0195], detection of skeleton locations and body segmentation locations in a single frame can be performed upon information received from only the single frame).
generating, by the one or more processors, a segmentation of the whole body of the user based on the image (see Fig. 10, outline 1016) and (see [0152-0153], wherein input image from a single camera can be used with a deep neural network for segmentation) and (see [0261], segmentation performed per frame basis 1608).
accessing a video feed comprising a plurality of images received prior to the monocular image and smoothing, using the video feed, the segmentation of the whole body generated based on the image to provide a smoothed segmentation (see [0148], video stream including multiple frames using segmentation), (see [0154], wherein segmentation from one frame can be propagated to a neighboring frame for smoothing algorithms that can be applied to reduced differences between frames) and (see Fig. 16, block 1610, in reference to [0261], refinement of segmentation involving propagating the segmentations and merging proportions from multiple view-points) and (see [0262], skeleton batch-smoothing for a sequence of images).
and applying one or more visual effects to the image based on the smoothed segmentation (see [0035], effects, such as wings) and (see [0143-0144], smoothing to reduce variation between frames, varying less frame by frame). 
	Holzer does not explicitly teach monocular images.  However, Holzer obviously teaches monocular images (see [0153], single camera used in conjunction with a deep neural network for segmentation) and (see [0915], skeleton locations and body segmentation locations in a single frame can be performed based upon information received from only the single frame) and (see [0044], 2d images and image frames, separate from other sources such as stereo camera and 3d cameras).  Thus, Holzer obviously teaches monocular images such as 2d images/frames, taken from a single camera and separate from stereo cameras and 3d cameras).
	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Holzer’s image processing system to explicitly use “monocular” images, as the invention is pertinent to the problem of using a single image frame for segmentation.  An advantage of the modification is that it achieves the result of explicitly using monocular two-dimensional image frames for whole body segmentation such as those from single cameras, which are more common/less expensive than other types of camera systems.
Re claim 2, Holzer teaches claim 1.  Furthermore, Holzer teaches wherein the whole body of the user comprises a specified combination of skeletal joints of the user (see abstract, Fig. 10, item 1022, and [0002-0005], skeleton detection in image/video of a person’s body).
	wherein the monocular image includes a depiction of a plurality of bodies of a plurality of users that include the user, further comprising a plurality of segmentation of the bodies of the plurality of users (see [0146], background segmentation to determine outline of multiple people so that they can be distinguished from the background and each other).

Re claim 3, Holzer teaches claim 1.  Furthermore, Holzer teaches wherein the monocular image is a first frame of a video, further comprising: (see [0034], wherein image data can be from a video stream of a plurality of frames)
generating the segmentation using a first machine learning technique, wherein smoothing the segmentation of the whole body comprises comparing the generated segmentation with a previous segmentation generated by the first machine learning technique from the plurality of monocular images (see [0153], deep neural network for segmentation of bodies in images), (see [0182], one or more frames of the image data can be analyzed to determine whether a person or some other object is present), and (see [0224], propagating between frames in multi-frame analysis) and (see [0261], propagating the segmentations and merging propagations from multiple viewpoints).
Re claim 4, Holzer teaches claim 3.  Furthermore, Holzer teaches wherein the first machine learning technique comprises a first deep neural network (see [0153], deep neural network).
Re claim 10, Holzer teaches claim 1.  Furthermore, Holzer teaches wherein smoothing the segmentation comprises applying the video feed to a “second” machine learning technique to predict one or more segmentations based on depictions of whole bodies in the plurality of monocular images ([0153] In one embodiment, raw input image from a single camera can be used in conjunction with a deep neural network for segmentation. The neural network can be trained to recognize bodies in images. Neural networks trained to recognize other types of objects, such as cars or animals, can also be utilized and the example of a person is provided for the purposes of illustration only. Weighting factors for a plurality of different neural nets trained to recognize a plurality of different objects can be stored on a mobile device and/or a remote device. For example, first weighting factors for a first neural net trained to recognize people, second weighting factors for a second neural net trained to recognize dogs, third weighting factors for a third neural net trained to recognize horses and a fourth neural net for a third neural net trained to recognize cars can be stored on a mobile device) and (see [0163], neural network segmentation using a set of neural networks).  Holzer teaches wherein smoothing the segmentation comprises applying the video feed to a second machine learning technique to predict one or more segmentations based on depiction of whole bodies in the plurality of monocular images (deep neural network to recognized bodies in images, and using a plurality of neural networks/nets to recognize and segment whole body of people/objects).
Re claim 13, Holzer teaches claim 1.  Furthermore, Holzer teaches wherein the plurality of monocular images was received a threshold number of seconds prior to receiving the monocular image (see [0198], wherein multi-frame analysis can be performed where information between frames is propagated and used to determine a result for a particular frame. For example, the body segmentation from the background can be performed on each frame in a series of frames where information about the body segmentation determined for a first frame affects the body segmentation determined for a second frame), (see [0261], wherein in segmentation 1606, the segmentation can be further computed, refined and smoothed. This refinement of the segmentation can be done on a per frame basis 1608. The smoothing can involve enforcing inter-frame consistency 1610. The inter-frame consistency 1610 can involve propagating the segmentations and merging propagations from multiple view-points. Some of this methodology is described above with respect to FIG. 10. For example, key point tracking can be used to generate a triangular mesh of super-pixels which are used to define transformations between images that are used in the segmentation smoothing process), (see [0134], pose/skeleton detection performed in delay or real time), and ([0263], The images, which are generated after stabilization is applied, can be fed into the skeleton batch-smoothing 1602 and/or the segmentation 1606. The skeleton detection 1602 and segmentation 1606 can then be applied to the new images. In addition, as described with respect to FIG. 10, the skeleton detection output can be used as an input for the segmentation 1606. Thus, the output from the skeleton detection 1602 can be received as an input at the segmentation 1606 prior to beginning the segmentation).  Holzer teaches wherein the plurality of monocular images was received a threshold number of seconds prior to receiving the monocular image (receiving frames of images and propagating/merging the frame images for segmentation, pose estimation and smoothing.  Hence, there is threshold number of seconds wherein the images are captured and processed in the system).
Re claim 14, Holzer teaches claim 1.  Furthermore, Holzer teaches computing a segmentation borer of the whole body of the user (see Fig. 10, wherein a whole body of a user is segmented).
Re claim 15, Holzer teaches claim 1.  Furthermore, Holzer teaches determining one or more device capabilities of a client device used to capture the monocular image; and selecting a segmentation model to generate the segmentation based on the one or more device capabilities (see [0112-0113], wherein the capabilities of a client device can allow processing to be performed solely on the client device or with aide of a server side, as well as capturing image data with capable mobile devices).
Re claim 16, Holzer teaches claim 1.  Furthermore, Holzer teaches applying the one or more visual effects to the monocular image based on a segmentation border associated with the smoothed segmentation (see Fig. 10, wherein a segmentation border is shown) and (see [0146-0148], augmenting people with effects and augmented background).
Re claim 18, Holzer teaches claim 1.  Furthermore, Holzer teaches wherein applying the one or more visual effects to the monocular image comprises replacing a background of the monocular image with a different background or replacing portions of the user depicted in the monocular image with different visual elements (see Fig. 10, wherein a segmentation border is shown) and (see [0146-0148], augmenting people with effects and augmented background).
Claims 19 and 20 claim limitations in scope to claim 1 and is rejected for at least the reasons above.	
	Claim(s) 5, 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322) in view of Fei-Fei (“One-Shot Learning of Object Categories”).
Re claim 5, Holzer teaches claim 4.
Holzer teaches further comprising training the first deep neural network (see [0153], training neural network to recognize bodies in images).	Holzer does not teach performing operations comprising:
receiving training data comprising a plurality of training monocular images and ground truth segmentations for each of the plurality of training monocular images;
applying the first deep neural network to a first training monocular image of the plurality of training monocular images to estimate a segmentation of a given body depicted in the first training monocular image;
computing a deviation between the estimated segmentation and the ground truth segmentation associated with the first training monocular image;
updating parameters of the first deep neural network based on the computed deviation: and repeating the applying, computing and updating steps for each of the plurality of training monocular images.
However, Fei-Fei teaches receiving training data comprising a plurality of training monocular images and ground truth segmentations for each of the plurality of training monocular images (see p. 594, 1. Introduction: “It is common knowledge in statistics that estimating a give number of parameters requires a many-fold larger number of training examples…learning one object category requires a batch process involving thousands or tens of thousands of training examples,” “Given a training set, no matter how small, we update this knowledge and produces a posterior density, which is then used for detection/recognition”), (see p. 595, 3.1, Overall Bayesian Framework: “To decide whether there is a flamingo bird or not, we compare the probability of a flamingo being present in the image with the probability of only background clutter being present in the image.  The decision is simple: If the probability of a flamingo present is higher, we decide this image contains an instance of a flamingo.  If it is the other way around, we decide there is no flamingo.  To compute the probability of a flaming being present in an image, we need a model of a flamingo, which we learn from a set of training images containing examples of flamingos.  Then we could compare this probability with the background model and, in turn, make our final decision”), and (see p. 605, “Two human subjects annotated the whole data set, giving ground truth information of the location of the contours of the objects within each image.  Given this information, we are able to compute the proportion of features detected within the object boundary as a fraction of the total number in the image”).  Fei-Fei teaches training using a plurality of monocular images (such as examples of flamingo images) and ground truth segmentations for each of the plurality of training monocular images (model of a flamingo learned, to create a background model to make comparison, and annotating data set giving ground truth information of the location of contours of objects within each training image).
applying the first deep neural network to a first training monocular image of the plurality of training monocular images to estimate a segmentation of a given body depicted in the first training monocular image (see p. 594, abstract wherein learning is thousands, one, or a handful of images), see p. 595, 2. Literature Review: “Given a new image, how do we detect the presence of a known object/category amongst clutter…learn models from training examples), and (see p. 595, 3.1, Overall Bayesian Framework: “To decide whether there is a flamingo bird or not, we compare the probability of a flamingo being present in the image with the probability of only background clutter being present in the image.  The decision is simple: If the probability of a flamingo present is higher, we decide this image contains an instance of a flamingo.  If it is the other way around, we decide there is no flamingo.  To compute the probability of a flaming being present in an image, we need a model of a flamingo, which we learn from a set of training images containing examples of flamingos.  Then we could compare this probability with the background model and, in turn, make our final decision”).  Fei-Fei teaches applying to the first deep neural network a first training monocular image (learning from at least a first training image) of the plurality of training monocular images to estimate a segmentation of a given body (flamingo for example) depicted in the first training monocular image)
computing a deviation between the estimated segmentation and the ground truth segmentation associated with the first training monocular image (see p. 595, 3.1, Overall Bayesian Framework: “To decide whether there is a flamingo bird or not, we compare the probability of a flamingo being present in the image with the probability of only background clutter being present in the image.  The decision is simple: If the probability of a flamingo present is higher, we decide this image contains an instance of a flamingo.  If it is the other way around, we decide there is no flamingo.  To compute the probability of a flaming being present in an image, we need a model of a flamingo, which we learn from a set of training images containing examples of flamingos.  Then we could compare this probability with the background model and, in turn, make our final decision”), (see p. 605, “Two human subjects annotated the whole data set, giving ground truth information of the location of the contours of the objects within each image.  Given this information, we are able to compute the proportion of features detected within the object boundary as a fraction of the total number in the image”), and (see Figure 15, wherein the y-axis is the ground truth category for the query image).  Fei-Fei teaches computing a deviation between the estimated segmentation and the ground truth segmentation associated with tie first training monocular image (compare probability that image contains a flamingo, for example).
updating parameters of the first deep neural network based on the computed deviation: and repeating the applying, computing and updating steps for each of the plurality of training monocular images (see p. 595, 3.1, Overall Bayesian Framework: “To decide whether there is a flamingo bird or not, we compare the probability of a flamingo being present in the image with the probability of only background clutter being present in the image.  The decision is simple: If the probability of a flamingo present is higher, we decide this image contains an instance of a flamingo.  If it is the other way around, we decide there is no flamingo.  To compute the probability of a flaming being present in an image, we need a model of a flamingo, which we learn from a set of training images containing examples of flamingos.  Then we could compare this probability with the background model and, in turn, make our final decision”) and (see p. 602-603, 6.4 & 6.5., wherein 4 data and 101 data set improves detection rate as training examples increase), and (see p. 594, 1. Introduction, wherein once a few categories have been learned the hard way, some information may be abstracted from that process to make learning further categories more efficient).  Fei-Fei teaches updating parameters of the first deep neural network based on the computed deviation (training image of a plurality of training images), and repeating the applying, computing, and updating steps for each of the plurality of training monocular images (parameters with previous trained knowledge to improve the training model, wherein more training images increases prediction success).
Holzer and Fei-Fei teaches claim 5.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Holzer’s deep learning system of training of images for deep learning to explicitly include using a plurality of images to update the learning model, as taught by Fei-Fei, as the references are in the analogous art of machine learning.  An advantage of the modification is that it achieves the result of explicitly using prior knowledge such as from a first training image to improve the learning system, as more and more images are used to train the system in a particular classification of image shapes/objects for object recognition.

Claim(s) 6, 8 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322) in view of Fei-Fei (“One-Shot Learning of Object Categories”) and Chen et al. (“Real-time Human Segmentation using Pose Skeleton Map”).
Re claim 6. Holzer and Fei-Fei teaches claim 5.  Holzer and Fei-Fei do not explicitly teach wherein the plurality of training monocular images comprise ground truth skeletal key points of one or more bodies depicted in the respective training monocular images, wherein the first deep neural network estimates skeletal key points of the given body depicted in the first training monocular image, further comprising updating parameters of the first deep neural network based on a deviation between the estimated skeletal key points and the ground truth skeletal key points.
However, Chen teaches wherein the plurality of training monocular images comprise ground truth skeletal key points of one or more bodies depicted in the respective training monocular images, wherein the first deep neural network estimates skeletal key points of the given body depicted in the first training monocular image, further comprising updating parameters of the first deep neural network based on a deviation between the estimated skeletal key points and the ground truth skeletal key points (see sections 2.2 & 2.3, wherein a skeletal map is passed for image segmentation), see 3, wherein the dataset includes 4881 images chosen as training set, 506 images are choses for validation, and 541 unlabeled images are left as test sets) and (see Figs, 3, 7 and 11, showing the application of the pose estimation network using skeletal map key points of one or more bodies for segmentation).
Holzer, Fei-Fei, and Chen teaches claim 6.  It would have been obvious to one of ordinary skill in the art before the filing of the claimed invention to modify Holzer and Fei-Fei’s training system of a plurality of training monocular images with ground truths to explicitly include skeletal key points of one or more bodies in the training monocular images and estimating skeletal key points using the training monocular images, as taught by Chen, as the references are in the analogous art of object segmentation using machine learning systems.  An advantage of the modification is that it achieves the result of using skeleton key point maps to train the system for pose estimation in segmentation.
Re claim 8, Holzer and Fei-Fei teaches claim 5.  Holzer and Fei-Fei do not explicitly teach wherein the plurality of training monocular images comprise a plurality of labeled and unlabeled image and video data.
However, Chen teaches wherein the plurality of training monocular images comprise a plurality of labeled and unlabeled image and video data (see p. 8474, Dataset: wherein labeled and unlabeled images are used for training and testing the image segmentation system.
Holzer, Fei-Fei, and Chen teaches claim 8.  It would have been obvious to one of ordinary skill before the effective filing date of the claimed invention to modify Holzer and Fei-Fei’s segmentation system using input monocular images to explicitly include both label and unlabeled input monocular images, as taught by Chen, as the references are in the analogous art of machine learning for image segmentation.  An advantage of the modification is that it achieves the result of using images labeled and unlabeled as part of the system to help train and validate the machine learning system.

Claim(s) 7 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322) in view of Fei-Fei (“One-Shot Learning of Object Categories”) and Zhao et al. (“ICNet for Real-Time Semantic Segmentation on High-Resolution Images”).
Re claim 7, Holzer, Fei-Fei, teaches claim 5.  Holzer and Fei-Fei do not explicitly teach wherein the plurality of training monocular images comprises a plurality of image resolutions 
further comprising generating a plurality of segmentation models based on the first deep neural network, a first of the plurality of segmentation models being trained based on training monocular images having a first of the plurality of image resolutions, a second of the plurality of segmentation models being trained based on training monocular images having a second of the plurality of image resolutions.
However, Zhao teaches wherein the plurality of training monocular images comprises 
a plurality of image resolutions (see Fig. 2, wherein the Cascade Image input are feature maps with different size ratios to the full-resolution input) and (see p. 5, wherein, “…it takes cascade image inputs (i.e., low-, medium- and high resolution images), adopts cascade feature fusion unit (Sec. 3.3) and is trained with cascade label guidance (Sec. 3.4).
further comprising generating a plurality of segmentation models based on the first deep neural network, a first of the plurality of segmentation models being trained based on training monocular images having a first of the plurality of image resolutions, a second of the plurality of segmentation models being trained based on training monocular images having a second of the plurality of image resolutions (see Fig. 2, wherein the Cascade Image input are feature maps with different size ratios to the full-resolution input) and (see p. 5, wherein, “…it takes cascade image inputs (i.e., low-, medium- and high resolution images), adopts cascade feature fusion unit (Sec. 3.3) and is trained with cascade label guidance (Sec. 3.4)) and (see p. 3, novel and unique image cascade network for real-time semantic segmentation, it utilizes semantic information in low resolution along with details from high-resolution images efficiently).
	Holzer, Fei-Fei, and Zhao teaches claim 7.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Holzer and Fei-Fei’s segmentation system using training to explicitly include the user of monocular images having a plurality of image resolutions, as taught by Zhao, as the references are in the analogous art of image segmentation systems using machine learning methods.  An advantage of the modification is that it achieves the result of using input training images of different resolutions to improve speed with lower resolution images and improve details with higher resolution images.
	Claim(s) 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322) in view of Fei-Fei (“One-Shot Learning of Object Categories”) and Wang et al. (“Crowd Counting and Segmentation in Visual Surveillance”).
Re claim 9, Holzer and Fei-Fei teaches claim 5.  Furthermore, Holzer teaches wherein the plurality of training monocular images comprise a depiction of a whole body of a particular user (see Fig. 10, wherein a whole body image of a user is inputted into the system)
And a depiction of a plurality of users (see [0146], distinguishing from each other different people using background segmentation and augment them).
Furthermore, Fei-Fei teaches an image that lacks a depiction of any user (see Fig. 1, wherein non-human sample images are used) and (see Fig. 2 including human and non-human category input images).  For motivation, see claim 5.
Holzer and Fei-Fei do not explicitly teach a depiction of users at different distances from an image capture device.
However, Wang teaches a depiction of a plurality of users and the depiction of users at different distances from an image capture device (see Fig. 1 and 3, wherein a plurality of “users” are captured at different distances from an image capture device that captures the images).
Holzer, Fei-Fei, and Wang teaches claim 9.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Holzer and Fei-Fei’s image segmentation processing system to explicitly include segmentation of a plurality of human “users,” as taught by Wang, as the references are in the analogous art of body segmentation from an input image.  An advantage of the modification is that it achieves the result of explicitly detecting a plurality of human bodies and segmentation of the human bodies for further processing.
Claim(s) 17 is/are rejected under 35 U.S.C. 103 as being unpatentable over Holzer et al. (US 20190116322) in view of Mathy et al. (US 20170272651).
Re claim 17 Holzer teaches claim 1.  Holzer does not explicitly teach applying a guided filter to improve segmentation quality of portions of the smoothed segmentations that are within a specified number of pixels of edges of the smoothed segmentation.
However, Mathy teaches teach applying a guided filter to improve segmentation quality of portions of the smoothed segmentations that are within a specified number of pixels of edges of the smoothed segmentation (see [0117], [0125-0132],  applying a guided filter for edge preservation, such as during post-processing).
Holzer and Marthy teaches claim 17.  It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Holzer’s image segmentation system to explicitly include a guided filter for edge preservation, as taught by Mathy, as the references are in the analogous art of machine learning for feature extraction such as segmentation of body parts.  An advantage of the modification is that it achieves the result of edge preserving filter such as smoothing images.
Allowable Subject Matter
Claims 11 and 12 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Peter Hoang whose telephone number is (571)270-1346. The examiner can normally be reached Monday-Friday 8:00 am - 5:00 pm PST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on (571)272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/PETER HOANG/Primary Examiner, Art Unit 2616