Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendment
Office action in response to applicant amendments entered on 1/18/2022.
Claims 1-3, 6, 12 and 19-20 are amended.  Claims 1-20 remain pending.  Claims 1, 3 and 20 being independent.

Response to Arguments 
The 35 USC 101 rejection of claim 19 is withdrawn in view of the claim amendments to claim 19.
Claim amendments entered 1/18/2022 create new 112(b) issues for depending claims 2, 4, and 13-14 as raised in the current office action.
Applicant’s arguments filed 1/18/2022 with respect to claims 1-20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.

Claim Interpretation
Claims 1-20 are use of the language “the RGB image as the only image” is interpreted in light of the specification and applicant remarks filed 4/21/2021 pages 7-9 and paragraphs 56-58 of the specification disclosing single-shot" feature point prediction models.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 2, 4, and 13-14 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 2 recites the limitation "the machine learning model" in line 2.  There is insufficient antecedent basis for this limitation in the claim.
Claim 4 recites “the predicted second set of data points” in line 2. There is insufficient antecedent basis for this limitation in the claim.
Claim 13 recites “the third-dimensional sample” in line 1. There is insufficient antecedent basis for this limitation in the claim.
Claim 14 recites “the third-dimensional sample” in line 1. There is insufficient antecedent basis for this limitation in the claim.
Claim 18 recites “the machine learning component” in line 1.  There is insufficient antecedent basis for this limitation in the claim.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-2 are rejected under 35 U.S.C. 103 as being unpatentable over Bulat, et al., "Human Pose Estimation via Convolutional Part Heatmap Regression", In Computer Vision and Pattern Recognition, September 06, 2016,15 Pages, in view of Martens; Harald Aagaard et al. US 6252974 B1, and further in view of Igsum US 20190318476.

Regarding Claim 1, Bulat discloses A computer-implemented method for predicting a location of a feature point of an articulated object (See Bulat the part detection of occluded body parts as explained in the caption of fig. 2)
comprising:
receiving, at a processor, a plurality of data points comprising a first set of data points and a second set of one or more data points (See Bulat Fig. 2 first row, caption first set correspond to the visible parts, second set to occluded parts; Pg. 6 Regression subnetwork “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image...”), wherein each data point of the first set comprises a two-dimensional location corresponding to a feature point of the articulated object (See Bulat fig. 2 first row, see locations of visible parts in fig. 2, e.g. "visible knee” Fig. 2 caption; Pg. 6 Regression subnetwork input), and 
each data point of the second set corresponds to a feature point of the articulated object (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..”);
inputting into a first machine learning model (See Bulat Pg. 3 “deep regression subnetwork” Pg. 6 Regression subnetwork; Fig. 6 Pg. 9-10 Regression subnetwork) the first set and the second set (See Bulat fig. 2 first row, caption Pg. 6 “..the N heatmaps, along with the input image...”) , wherein the first machine learning model is trained to:
receive a plurality of two-dimensional location data points each corresponding to a feature point location of an articulated object (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image”) from a red, green, and blue (RGB) image (See Bulat Fig. 1 and 2 input image is clearly an RGB image; (See Bulat Fig. 2 caption; Pg. 6 “the input image”).
where one or more of the received two-dimensional location data of the articulated object are missing (See Bulat fig. 2 first row, see heatmaps of occluded parts, Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” thus identified as missing),
receive predicted two-dimensional location data for each data point of the second set of data points (See Bulat fig. 2 second row, caption “..the output of our regression subnetwork… provide high confidence for the correct location of the occluded parts..”; Fig. 7 poses obtained using our method).
Bulat teaches each data point of the second set corresponds to a missing feature point of the articulated object that is missing (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” emphasis added);
Bulat does not explicitly disclose the first machine learning model is a first conditional variational autoencoder; 
each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and
where one or more of the received two-dimensional location data of the articulated object are identified as missing.
receive, by a second conditional variational autoencoder, predicted two-dimensional location data for each data point of the second set of data points; 
based at least on the received predicted two-dimensional location data, predict third-dimension data for each of the predicted two-dimensional location data; and 
combine the predicted two-dimensional location data with the third-dimension data to predict three-dimensional feature point location.
Martens teaches each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and where one or more of the received two-dimensional location data of the articulated object are identified as missing (See Martens Fig. 1 Col. 4 Line 12-25  “The Local Occlusion Detector 110 receives video input 150 and analyzes the frames of the video input for local occlusion patterns” [equivalent of second set of one or more data points]  “Global Depth Model Generator 120 combines information about the local occlusion patterns into a depth model 130. Based on the depth model 130, estimated depths 160 are output”  [equivalent to machine learning model to predict location data]; Col. 5 Lines 1-30 “…The output from the Local Occlusion Detector 110 will thus be a list of feature points indices, together with a Found or NotFound label for each feature point ... feature points that are not found are regarded as being occluded..”; Fig. 3, 4 and 7 Col. 6 Lines 50-60).
inputting into a second machine learning model (See Martens C.4 L10-35, Fig. 1 Depth model generator, occlusion forecaster) predicted two- dimensional data for each data point of the second set of data points (See Martens C.4 L10-35, Fig. 1 Video Input and local occlusion patterns) from the first machine learning model (See Martens C.4 L10-35, Fig. 1 occlusion detector), the wherein the second machine learning model is trained to:
based at least on the received predicted two-dimensional location data, predict third dimension data for each of the predicted two-dimensional location data (See Martens C.4 L10-35, Fig. 1 depth forecast, estimated depths, C. 10 L. 1-15 Fig. 7 predicted depth).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the data points of Bulat to include the noted teachings of Martens, in order for modelling of the depth dimension of occluded objects (Martens Col 1 lines 5-12).
Bulat in view of Martens does not explicitly disclose the first machine learning model is a first conditional variational autoencoder; the second machine learning model is a second conditional variational autoencoder; and 
combine the predicted two-dimensional location data with the third-dimension data to predict three-dimensional feature point location.
Igsum teaches the first machine learning model is a first conditional variational autoencoder and the second machine learning model is a second conditional variational autoencoder (See Igsum Figs. 17-18 [0117] 3D-VCAE 1D-VCAE 1703 1704; [0125]-[0129] FIG. 20-21); and
combine the predicted two-dimensional location data with the third-dimension data (See Igsum [0016] the volumetric image dataset and the axial trajectory of the VOI; Figs. 17-18) to predict three-dimensional feature point location (See Igsum  [0016] creating a three-dimensional (3D) multi-planer reformatted (MPR) image; Fig. 8B)
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination of Bulat and Martens, to include the noted teachings of Igsum, in order for medical organ assessment in images  (Igsum [Abstract]).

Regarding Claim 2, the combination teaches wherein the machine learning model is further trained to: 
using the RGB image as the only image (See Bulat Fig. 1 and 2 input image is clearly a single an RGB image) used to predict two-dimensional location data for each feature point location that was missing (See Bulat fig. 2 second row, Pg. 6 Regression subnetwork “..the regression part of our network to rely on contextual information (provided by the remaining parts) in order to predict the location of these parts..”), and wherein the received plurality of two-dimensional location data points are the only two-dimensional location data points used to predict the two-dimensional location data for each feature point location that was missing (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image” where heatmaps correspond to two-dimensional location data and are the only two-dimensional location data used);
and further teaches each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and where one or more of the received two-dimensional location data of the articulated object are identified as missing (See Martens Fig. 1 Col. 4 Line 12-25  “The Local Occlusion Detector 110 receives video input 150 and analyzes the frames of the video input for local occlusion patterns” [equivalent of second set of one or more data points]  “Global Depth Model Generator 120 combines information about the local occlusion patterns into a depth model 130. Based on the depth model 130, estimated depths 160 are output”  [equivalent to machine learning model to predict location data]; Col. 5 Lines 1-30 “…The output from the Local Occlusion Detector 110 will thus be a list of feature points indices, together with a Found or NotFound label for each feature point ... feature points that are not found are regarded as being occluded..”; Fig. 3, 4 and 7 Col. 6 Lines 50-60).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination to include the noted teachings of Martens, in order for modelling of the depth dimension of occluded objects (Martens Col 1 lines 5-12).
 
Claims 3-5, 7-16, and 19-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bulat, et al., "Human Pose Estimation via Convolutional Part Heatmap Regression", In Computer Vision and Pattern Recognition, September 06, 2016,15 Pages, in view of Martens; Harald Aagaard et al. US 6252974 B1, and further in view of Shotton, et al., "Real-Time Human Pose Recognition in Parts from Single Depth Images", In Journal of Communications of the ACM, Volume 56, Issue 1, January 01, 2013, pp. 116-124. 

Regarding Claim 3, Bulat discloses A computer-implemented method for predicting a location of a feature point of an articulated object (See Bulat the part detection of occluded body parts as explained in the caption of fig. 2) comprising:
receiving, at a processor, a plurality of data points comprising a first set of data points and a second set of one or more data points (See Bulat Fig. 2 first row, caption first set correspond to the visible parts, second set to occluded parts; Pg. 6 Regression subnetwork “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image...”), wherein each data point of the first set comprises a two-dimensional location corresponding to a feature point of the articulated object (See Bulat fig. 2 first row, see locations of visible parts in fig. 2, e.g. "visible knee” Fig. 2 caption; Pg. 6 Regression subnetwork input), and 
each data point of the second set corresponds to a feature point of the articulated object (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..”);
inputting into a first machine learning model (See Bulat Pg. 3 “deep regression subnetwork” Pg. 6 Regression subnetwork; Fig. 6 Pg. 9-10 Regression subnetwork) the first set and the second set (See Bulat fig. 2 first row, caption Pg. 6 “..the N heatmaps, along with the input image...”) , wherein the first machine learning model is trained to:
receive a plurality of two-dimensional location data points each corresponding to a feature point location of an articulated object (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image”) from a red, green, and blue (RGB) image (See Bulat Fig. 1 and 2 input image is clearly an RGB image; (See Bulat Fig. 2 caption; Pg. 6 “the input image”).
where one or more of the received two-dimensional location data of the articulated object are missing (See Bulat fig. 2 first row, see heatmaps of occluded parts, Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” thus identified as missing), 
wherein the received plurality of two-dimensional location data points are the only two-dimensional location data points used to predict the two-dimensional location data for each feature point location that was missing (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image” where heatmaps correspond to two-dimensional location data and are the only two-dimensional location data used); 
predicted two-dimensional location data for each data point of the second set of data points from the first machine learning model (See Bulat fig. 2 second row, caption “..the output of our regression subnetwork… provide high confidence for the correct location of the occluded parts..”; Fig. 7 poses obtained using our method).
Bulat teaches each data point of the second set corresponds to a missing feature point of the articulated object that is missing (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” emphasis added);
Bulat does not explicitly disclose each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and
where one or more of the received two-dimensional location data of the articulated object are identified as missing; 
inputting into a second machine learning model predicted two- dimensional data for each data point of the second set of data from the first machine learning model, the wherein the second machine learning model is trained to:
based at least on the received predicted two-dimensional location data, predict third dimension data for each of the predicted two-dimensional location data; and
combine the predicted two-dimensional location data with the third-dimension data to predict three-dimensional feature point location.
Martens teaches each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and where one or more of the received two-dimensional location data of the articulated object are identified as missing (See Martens Fig. 1 Col. 4 Line 12-25  “The Local Occlusion Detector 110 receives video input 150 and analyzes the frames of the video input for local occlusion patterns” [equivalent of second set of one or more data points]  “Global Depth Model Generator 120 combines information about the local occlusion patterns into a depth model 130. Based on the depth model 130, estimated depths 160 are output”  [equivalent to machine learning model to predict location data]; Col. 5 Lines 1-30 “…The output from the Local Occlusion Detector 110 will thus be a list of feature points indices, together with a Found or NotFound label for each feature point ... feature points that are not found are regarded as being occluded..”; Fig. 3, 4 and 7 Col. 6 Lines 50-60).
inputting into a second machine learning model (See Martens C.4 L10-35, Fig. 1 Depth model generator, occlusion forecaster) predicted two- dimensional data for each data point of the second set of data points (See Martens C.4 L10-35, Fig. 1 Video Input and local occlusion patterns) from the first machine learning model (See Martens C.4 L10-35, Fig. 1 occlusion detector), the wherein the second machine learning model is trained to:
based at least on the received predicted two-dimensional location data, predict third dimension data for each of the predicted two-dimensional location data (See Martens C.4 L10-35, Fig. 1 depth forecast, estimated depths, C. 10 L. 1-15 Fig. 7 predicted depth).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the data points of Bulat to include the noted teachings of Martens, in order for modelling of the depth dimension of occluded objects (Martens Col 1 lines 5-12).
Bulat in view of Martens does not explicitly disclose combine the predicted two-dimensional location data with the third-dimension data to predict three-dimensional feature point location.
Shotton teaches combine the predicted two-dimensional location data with the third-dimension data to predict three-dimensional feature point location (See Shotton Fig. 1, Fig. 5 inferred body parts eq. two-dimensional data, joint proposals eq. third dimension data 3.4 Joint position proposal see density estimator per body part equations (7) and (8) where P(c|I,xi) equiv of three dimensional feature point location as shown Fig. 5 and Fig. 1).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination of Bulat in view of Martens, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Regarding Claim 4, the combination teaches combining the first set of the data points with the predicted second set of data points (See Bulat fig. 2 second row, caption “..the output of our regression subnetwork..” output contains visible and occluded parts).
Regarding Claim 5, the combination teaches wherein the first machine learning model is a probabilistic machine learning model, and the predicted two- dimensional location data comprises one of multiple samples of a distribution, a single sample of a distribution or a mean of a distribution as a single sample (See Bulat fig. 2 second row, caption “..second row shows the output of our regression subnetwork..” output heatmap is a distribution).
Regarding Claim 7, the combination teaches wherein each of the received data points of the first set and second set is a labelled feature of an articulated object (See Bulat fig. 2, see the name of the parts; Pg. 5 “..encode part label information as a set of N binary maps, one for each part..”).
Regarding Claim 8, the combination teaches the plurality of two- dimensional location data points received at the processor correspond to a labeled image of the articulated object, and each label identifies a feature point of the articulated object (See Bulat fig. 2, see the name of the parts;, Pg. 5 “..encode part label information as a set of N binary maps, one for each part..”).

Regarding Claim 9, the combination teaches wherein at least one of the feature points corresponds to a joint location of the articulated object (See Bulat fig. 2, see the name of the part; Pg. 5 “..encode part label information as a set of N binary maps, one for each part..” e.g. wrist, knee or ankle).

Regarding Claim 10, the combination teaches a Boolean value input into the first machine learning model for a single data point identifies whether the data point belongs to the first set or the second set (See Martens Fig. 1 Col. 5 Lines 1-30 “…The output from the Local Occlusion Detector 110 will thus be a list of feature points indices, together with a Found or NotFound label for each feature point ... feature points that are not found are regarded as being occluded..” Found/Notfound equivalent of boolean value; Fig. 3, 4 and 7 Col. 6 Lines 50-60).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Martens, in order modelling of the depth dimension of occluded objects (Martens Col 1 lines 5-12).

Regarding Claim 11, the combination teaches wherein a value of a received data point either being of a specific value or belonging within a specific range of values identifies whether the data point belongs to the first set or the second set (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” thus identified as missing).

Regarding Claim 12, the combination teaches wherein the second machine learning model is a probabilistic machine learning model (See Martens C.14L0-10 probability matrix; See Shotton Fig. 1, Fig. 5 inferred body parts eq. of predicted two-dimensional data, joint proposals eq. distribution in third dimension is probabilistic model 3.4 Joint position proposal see density estimator per body part equations (7) and (8) is probabilistic model).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Regarding Claim 13, the combination teaches the third- dimensional sample comprises one of multiple samples of the distribution, a single sample of the distribution or a mean of the distribution (See Shotton Fig. 1, Fig. 5 inferred body parts eq. of combined set of two-dimensional data, joint proposals eq. distribution in third demension 3.4 Joint position proposal see density estimator per body part equations (7) and (8) where P(c|I,xi) equiv of combined set two-dimensional data and detected joint position equiv of third dimensional sample as shown Fig. 5 and Fig. 1).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Regarding Claim 14, the combination teaches adding the third-dimensional sample for each two-dimensional data point to the respective two-dimensional data point to create a plurality of three-dimensional data points (See Shotton Fig. 1, Fig. 5 inferred body parts eq. of combined set of two-dimensional data, joint proposals eq. distribution in third dimension 3.4 Joint position proposal see density estimator per body part equations (7) and (8) where P(c|I,xi) equiv of combined set two-dimensional data and detected joint position equiv of third dimensional sample as shown Fig. 5 and Fig. 1).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Regarding Claim 15, the combination teaches wherein there is no feedback of location data from a previously output of the second machine learning model as an input into either the first machine learning model or the second machine learning model (See rejections of Claims 3 and 12, Bulat nor Shotton teach requirement of feedback of location data from previous output thus obvious there is no feedback of location data).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Regarding Claim 16, the combination teaches the combined set of two-dimensional data inputted comprises a plurality of samples for each two- dimensional location data point (See Bulat Fig. 2 second row, caption “..the output of our regression subnetwork..” output heatmap is a distribution).

Regarding Claim 19, the combination teaches one or more computer storage media with device-executable instructions that, when executed by a computing system, direct the computing system to perform for performing operations comprising the method steps of claim 3 (See Bulat the part detection of occluded body parts as explained in the caption of fig. 2 inherent on a computing device).

Regarding Claim 20, Bulat discloses A system to predict a location of a feature point of an articulated object, the system comprising a computing-based device (See Bulat the part detection of occluded body parts as explained in the caption of fig. 2) configured to:
receive a plurality of data points comprising a first set of data points and a second set of one or more data points (See Bulat Fig. 2 first row, caption first set correspond to the visible parts, second set to occluded parts; Pg. 6 Regression subnetwork “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image...”), wherein each data point of the first set comprises a two-dimensional location corresponding to a feature point of the articulated object (See Bulat fig. 2 first row, see locations of visible parts in fig. 2, e.g. "visible knee” Fig. 2 caption; Pg. 6 Regression subnetwork input), and 
each data point of the second set corresponds to a feature point of the articulated object (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..”);
input into a first machine learning model (See Bulat Pg. 3 “deep regression subnetwork” Pg. 6 Regression subnetwork; Fig. 6 Pg. 9-10 Regression subnetwork) the first set and the second set (See Bulat fig. 2 first row, caption Pg. 6 “..the N heatmaps, along with the input image...”) , wherein the machine learning model is trained to:
receive a plurality of two-dimensional location data points each corresponding to a feature point location of an articulated object (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image”) from a red, green, and blue (RGB) image (See Bulat Fig. 1 and 2 input image is clearly an RGB image; (See Bulat Fig. 2 caption; Pg. 6 “the input image”).
where one or more of the received two-dimensional location data of the articulated object are missing (See Bulat fig. 2 first row, see heatmaps of occluded parts, Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” thus identified as missing), and
using the RGB image, as the only image to (See Bulat Fig. 1 and 2 input image is clearly a single an RGB image), predict two-dimensional location data for each feature point location that was missing (See Bulat fig. 2 second row, Pg. 6 Regression subnetwork “..the regression part of our network to rely on contextual information (provided by the remaining parts) in order to predict the location of these parts..”),  and wherein the received plurality of two-dimensional location data points are the only two-dimensional location data points used to predict the two-dimensional location data for each feature point location that was missing (See Bulat Fig. 2 caption; Pg. 6 “..The input of this subnetwork is a multi-channel representation produced by stacking the N heatmaps produced by the part detection subnetwork, along with the input image” where heatmaps correspond to two-dimensional location data thus the only two-dimensional location data input); and
receive from the first machine learning model predicted two-dimensional location data for each data point of the second set of data points (See Bulat fig. 2 second row, caption “..the output of our regression subnetwork… provide high confidence for the correct location of the occluded parts..”; Fig. 7 poses obtained using our method).
Bulat teaches each data point of the second set corresponds to a missing feature point of the articulated object that is missing (See Bulat fig. 2 first row, see heatmaps of occluded parts; Pg. 6 Regression subnetwork input “..the part detection heatmaps for the occluded parts provide low confidence scores..” emphasis added).
Bulat does not explicitly disclose each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and
where one or more of the received two-dimensional location data of the articulated object are identified as missing;
input into a second machine learning model the predicted two- dimensional data, wherein the second machine learning model is a probabilistic machine learning model trained to receive a plurality of two-dimensional location data points and the second machine learning model is trained to predict a distribution in a third dimension for each received two- dimensional location data point; sample a third-dimension value from each distribution; and output the third-dimensional sample.
Martens teaches each data point of the second set corresponds to a feature point of the articulated object without associated two-dimensional location data or wherein the two-dimensional location data is identified as missing; and where one or more of the received two-dimensional location data of the articulated object are identified as missing (See Martens Fig. 1 Col. 4 Line 12-25  “The Local Occlusion Detector 110 receives video input 150 and analyzes the frames of the video input for local occlusion patterns” [equivalent of second set of one or more data points]  “Global Depth Model Generator 120 combines information about the local occlusion patterns into a depth model 130. Based on the depth model 130, estimated depths 160 are output”  [equivalent to machine learning model to predict location data]; Col. 5 Lines 1-30 “…The output from the Local Occlusion Detector 110 will thus be a list of feature points indices, together with a Found or NotFound label for each feature point ... feature points that are not found are regarded as being occluded..”; Fig. 3, 4 and 7 Col. 6 Lines 50-60).
input into a second machine learning model (See Martens C.4 L10-35, Fig. 1 Depth model generator, occlusion forecaster) predicted two- dimensional data for each data point of the second set of data points (See Martens C.4 L10-35, Fig. 1 Video Input and local occlusion patterns) from the first machine learning model (See Martens C.4 L10-35, Fig. 1 occlusion detector), wherein the second machine learning model is a probabilistic machine learning model trained to predict third dimension data for each of the predicted two-dimensional location data (See Martens C.4 L10-35, Fig. 1 depth forecast, estimated depths, C. 10 L. 1-15 Fig. 7 predicted depth).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the data points of Bulat to include the noted teachings of Martens, in order modelling of the depth dimension of occluded objects (Martens Col 1 lines 5-12).
The combination of Bulat and Martens does not explicitly disclose wherein the second machine learning model is trained to predict a distribution in a third dimension for each received two- dimensional location data point; sample a third-dimension value from each distribution; and output the third-dimensional sample.
Shotton teaches wherein the second machine learning model is trained to predict a distribution in a third dimension for each received two- dimensional location data point; sample a third dimension value from each distribution; and output the third-dimensional sample (See Shotton Fig. 1, Fig. 5 inferred body parts eq. of combined set of two-dimensional data, joint proposals eq. distribution in third dimension 3.4 Joint position proposal see density estimator per body part equations (7) and (8) where P(c|I,xi) equiv of combined set two-dimensional data and detected joint position equivalent of third dimensional sample as shown Fig. 5 and Fig. 1).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination, to include the noted teachings of Shotton, in order to spatially localize joints of interest (Shotton Pg. 2 Col. 1 Paragraph 2).

Claims 6, 17-18 are rejected under 35 U.S.C. 103 as being unpatentable over the combination of Bulat (cited above), Martens (cited above) and Shotton, as applied to claim 3, further in view of Igsum US 20190318476.

Regarding Claim 6, the combination discloses the first machine learning model (See Bulat Pg. 6 Regression subnetwork) predicts locations of features that are not detected and combines the predicted locations with locations of the plurality of two-dimensional location data points (See Bulat fig. 2 second row, caption “..the output of our regression subnetwork… provide high confidence for the correct location of the occluded parts..”; Fig. 7 poses obtained using our method).
The combination does not explicitly disclose the first machine learning model is a conditional variational autoencoder.
Igsum teaches the first machine learning model is a first conditional variational autoencoder (See Igsum Figs. 17-18 [0117] 3D-VCAE 1D-VCAE 1703 1704; [0125]-[0129] FIG. 20-21).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination of Bulat and Martens, to include the noted teachings of Igsum, in order for medical organ assessment in images  (Igsum [Abstract]).

Regarding Claim 17, the combination discloses the second machine learning model (See Martens C.4 L10-35, Fig. 1 Depth model generator, occlusion forecaster) predicts depth data for each feature point of the articulated object (See Martens C.4 L10-35, Fig. 1 Video Input and local occlusion patterns).
The combination does not explicitly disclose the second machine learning model is a conditional variational autoencoder.
Igsum teaches the second machine learning model is a first conditional variational autoencoder (See Igsum Figs. 17-18 [0117] 3D-VCAE 1D-VCAE 1703 1704; [0125]-[0129] FIG. 20-21).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination of Bulat and Martens, to include the noted teachings of Igsum, in order for medical organ assessment in images (Igsum Abstract).

Regarding Claim 18, The combination discloses the machine learning component (See Bulat Pg. 6 Regression subnetwork). 
The combination does not explicitly disclose the machine learning component is stored in memory in one of a smartphone, a tablet computer a games console and a laptop computer.
Igsum teaches wherein the machine learning component is stored in memory in one of a smartphone, a tablet computer a games console and a laptop computer (Igsum Fig. 2 Computing Device [0060]).
It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art, to modify the combination of Bulat and Martens, to include the noted teachings of Igsum, in order for medical organ assessment in images (Igsum Abstract).
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to UMAIR AHSAN whose telephone number is (571)272-1323. The examiner can normally be reached Monday - Friday 10-5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Srilakshmi Kumar can be reached on (571) 272-7769. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/UMAIR AHSAN/Examiner, Art Unit 2647