DETAILED ACTION

Notice of Pre-AIA  or AIA  Status

The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

1.        In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

2.	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

3.	Claims 1-4, 8-13, and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Funka-Lea, US 2019/0261945 A1, and in view of Li et al., US 2019/0087985 A1, and further and in view of Dernoncourt et al., US 2019/0384807 A1.

4. 	As per claim 1, Funka-Lea discloses: A computer-implemented method, comprising: 
providing a two-dimensional image of an object as input to a generative network; (Funka-Lea, Figure 9, and  [0008], “An image processor is configured to generate the three-dimensional segmentation from the two-dimensional ICE images with a machine-learned generative network. ”)
generating, using the generative network, a set of view images of the object represented from different views;  (Funka-Lea, [0034], “In one embodiment shown in FIG. 3, a set of 2D ICE images 30 are input, each including part of the heart in its field of view. The sensed 3D position is used to map all the 2D ICE images 30 to 3D space 32, thus forming a sparse ICE volume 34. The generated 3D sparse ICE volume 34 keeps the spatial relationships among individual ICE views.”)

5.	Funka-Lea doesn’t expressly disclose:
providing, as input to an inverse graphics network, the set of view images and information for the different views; 
determining, for individual view images of the set using the inverse graphics network, a set of three-dimensional information; 
rendering, for the individual view images of the set, a representation of the object using the set of three-dimensional information and the respective view information; 
comparing the rendered representations against corresponding data to determine at least one loss value; and 


6.	Li discloses: 
providing, as input to an inverse graphics network, the set of view images and information for the different views; (Li, [0046], “During normal operation, the inverse rendering system 130 receives 2D image data 136 and determines initial values for parameters for a 3D scene 132. The initial values are determined to produce rendered image data 148 as an approximation of the 2D image data 136.”, and [0032], “At step 111, the inverse graphics rendering system receives image data in a two-dimensional (2D) space. The image data may include one or more images of an object or a scene comprising one or more objects. At step 113, the inverse graphics rendering system determines initial parameter values for geometric primitives (e.g., triangles) in a three-dimensional (3D) space.”, and [0037], “An inverse graphics system may be required to optimize thousands or millions of parameters (e.g., vertex positions for geometric objects and meshes, camera position and attributes, lighting details, and so forth).”)
determining, for individual view images of the set using the inverse graphics network, a set of three-dimensional information; (Li, [0046], “During normal operation, the inverse rendering system 130 receives 2D image data 136 and determines initial values for parameters for a 3D scene 132.”)
rendering, for the individual view images of the set, a representation of the object using the set of three-dimensional information and the respective view information; (Li, “ An inverse graphics system may be required to optimize thousands or millions of parameters (e.g., vertex positions for geometric objects and meshes, camera position and attributes, lighting details, and so forth). An optimization framework may be configured to include the rendering pipeline.”, and [0032]. “At step 111, the inverse graphics rendering system receives image data in a two-dimensional (2D) space. The image data may include one or more images of an object or a scene comprising one or more objects. At step 113, the inverse graphics rendering system determines initial parameter values for geometric primitives (e.g., triangles) in a three-dimensional (3D) space.”)
adjusting one or more network parameters for the inverse graphics network based at least in part upon the at least one loss value.  (Li, [0035]. “At step 119, the inverse graphics rendering system updates the initial parameter values based on the differences and the derivatives. Each parameter value may be updated, for example, according to one or more associated derivatives. In one embodiment, the parameters are updated using gradient descent. In practice, a parameter is updated by nudging the parameter in a direction that reduces or minimizes the error (opposite direction of the gradient of the error). Given a parameter x, the new updated parameter x′ may be computed as x-alpha*dD/dx, where D is the difference between the two images (i.e., the differences value) and alpha controls the length of the step for each iteration. The same equation may be applied to all parameters that the user wants to update.”)



8.	Funka-Lea in view of Li doesn’t expressly disclose:
comparing the rendered representations against corresponding ground truth training data to determine at least one loss value, the ground truth training data based at least in part on annotation data; 

9.	Dernoncourt discloses: 
Additionally, as shown in FIG. 6, the annotation machine learning model 604 utilizes the training electronic documents 602, the predicted digital annotations 606, and the ground-truth digital annotations 610 to learn to accurately generate digital annotations for an electronic document that correspond to significant sentences of the electronic document. For example, the digital document annotation system 110 compares the predicted digital annotations 606 and the ground-truth digital annotations 610 to train the annotation machine learning model 604. In particular, the digital document annotation system 110 compares the predicted digital annotations 606 and the ground-truth digital annotations 610 utilizing the loss function 608 (e.g., mean squared error loss function, cosine similarity loss function, or another loss function), which generates the calculated loss 612. In particular, the loss function 608 can determine if the predicted digital annotations 606 accurately reflect the ground-truth digital annotations 610 of the training electronic documents 602.”

23.	Dernoncourt is analogous art with respect to Funka-Lea and Li  because they are from the same field of endeavor, namely image processing.  At the time the application was filed, it would have been obvious to a person of ordinary skill in the art to include the process of comparing the rendered representations against corresponding ground truth training data to determine at least one loss value, the ground truth training data based at least in part on annotation data;, as taught by Dernoncourt into the teaching of  accurately generate reliable training annotations. Therefore, it would have been obvious to combine Dernoncourt with Funka-Lea and Li.


8. 	As per claim 2, Funka-Lea in view of Li, and in view Dernoncourt discloses: The computer-implemented method of claim 1, further comprising: providing at least a subset of the representations of the object, rendered by the inverse graphics network, as training data to further train the generative network.  (Li, [0047], “The inverse rendering system 130 may perform multiple iterations of optimizing, updating, and rendering the parameters for a 3D scene 132, with the goal of sequential iterations causing corresponding generated rendered image data 148 to converge to the 2D image data 136. This process of optimization may cause parameters for a 3D scene 132 to more closely model actual scene and object geometry depicted in the 2D image data 136. In one embodiment, the parameters for a 3D scene 132 are optimized (e.g., trained) by the parameter adjustment engine 134 using a gradient descent or any other derivative-based optimization technique based on the error data and the parameter derivative.”)

9. 	As per claim 3, Funka-Lea in view of Li discloses: The computer-implemented method of claim 2, further comprising: training the inverse graphics network and the generative network together using a common loss function. (Funka-Lea, [0061], ”  For training any of the networks, various optimizers may be used, such as Adadelta, SGD, RMSprop, or Adam. The weights of the network are randomly initialized, but another initialization may be used. End-to-end training is performed, but one or more features may be set. Batch normalization, dropout, and data augmentation are not used, but may be (e.g., using batch normalization and dropout). During the optimization, the different distinguishing features are learned. The features providing an indication of anatomy location or missing volume information given a input sparse ICE volume are learned.”, and [0062], “The optimizer minimizes an error or loss, such as the Mean Squared Error (MSE), Huber loss, L1 loss, or L2 loss. In one embodiment, the machine training uses a combination of adversarial loss and reconstruction loss.”)  

10. 	As per claim 4, Funka-Lea in view of Li, and in view Dernoncourt discloses: The computer-implemented method of claim 1, wherein the generative network is a style generative adversarial network enabling only camera view-related features to be adjusted for generating the set of view images. (Li, [0061], “In one embodiment, this optimization may be similar to other neural network style transfer techniques. For example, a scene may be rendered with a resulting image presented to a first convolutional neural network (CNN) configured for image recognition. In one embodiment the CNN comprises a deep neural network, as described herein. Intermediate responses of the CNN are sampled to generate first feature vectors. The style image is presented to a second CNN similarly configured for image recognition, with intermediate responses sampled to generate second feature vectors. The feature vectors may be interpreted to be a high-dimensional embedding of each image. Alternatively, the feature vectors may be interpreted to be non-linear frequency decompositions. The CNN layers (intermediate responses) that produce edge and texture responses of respective images may be extracted as the feature vectors.”)

11. 	As per claim 8, Funka-Lea in view of Li, and in view Dernoncourt discloses: The computer-implemented method of claim 1, wherein the three-dimensional information for the object includes at least one of a shape, texture, lighting, or background for the object. (Li, [0028], “Embodiments of the present invention include an inverse graphics system configured to generate 3D scene and/or object models for an observed image. Such models may include parameters that describe arbitrary geometry, lighting, and texturing in the observed image.”)

12. 	As per claim 9, Funka-Lea in view of Li, and in view Dernoncourt discloses: The computer-implemented method of claim 1, wherein the two- dimensional image input to the generative network is annotated with weakly accurate camera information corresponding to a subset of object features.  (Li, [0066], “At each iteration, the set of parameters are updated based on a previous set of parameters and a newly updated image is rendered for comparison to the reference image. At each iteration, an updated image is compared to the reference image in step 263. Furthermore, in step 267 the updated set of parameters is based on a previous set of parameters, initially comprising the first set of parameters. The parameters may include geometric primitive information, shading information, position information, texture information, camera position information, illuminator position information, and the like.”)


14.	Claims 11, and 17 which are similar in scope to claim 2, thus rejected under the same rationale.
15.	Claims 12, and 18 which are similar in scope to claim 3, thus rejected under the same rationale.
16.	Claims 13, and 19 which are similar in scope to claim 4, thus rejected under the same rationale.

17.	As per claim 15, Funka-Lea in view of Li discloses: The system of claim 10, wherein the system comprises at least one of: a system for performing graphical rendering operations; a system for performing simulation operations; a system for performing simulation operations to test or validate autonomous machine applications; a system for performing deep learning operations; a system implemented using an edge device; a system incorporating one or more Virtual Machines (VMs); a system implemented at least partially in a data center; or a system implemented at least partially using cloud computing resources. ( Li, [0068], “ In one embodiment, the inverse graphics rendering system 130 is configured to perform method 270. Method 270 may be repeated for each pixel within a given graphics scene to render the scene. In one embodiment, method 270 may implement step 261 and/or step 269 of method 260.”)

.

19.	Claims 5, 6, 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Funka-Lea, US 2019/0261945 A1, in view of Li et al., US 2019/0087985 A1, and in view of Dernoncourt et al., US 2019/0384807 A1, and further in view of Chhabra et al., US 2019/0354806 A1.

20. 	As per claim 5, Funka-Lea in view of Li, and in view Dernoncourt discloses: The computer-implemented method of claim 1, (See rejection of claim 1 above.)

21.	Funka-Lea in view of Li, and in view Dernoncourt doesn’t expressly disclose: using a selection matrix to reduce a dimensionality of image features to be included in a latent code to be used to render the representation of the object.  

22.	Chhabra discloses: using a selection matrix to reduce a dimensionality of image features to be included in a latent code to be used to render the representation of the object. (Chhabra, [0049], “Herein, the set of latent code refers to an alphanumeric arrangement or representation of the features of a set of data. In embodiments, the set of latent code may be dimensionally reduced with respect to the set of input data. In addition, in certain embodiments the set of latent code may be a dimensionally reduced binary representation of the set of input data.”, and [0059])



24. 	As per claim 6, Funka-Lea in view of Li, and in view Dernoncourt, and further in view Chhabra discloses: The computer-implemented method of claim 5, further comprising: rendering the representation of the object based, at least in part, upon the latent code and using a differentiable renderer.  (Chhabra, [0054], “As described above, a loss function that penalizes deviation from original data without noise may be utilized to impose a loss on the data in order to acquire a representation of the data such that the embedding (e.g., latent code) represents the hierarchical relationships in the data. The total loss is given by the following Equation 4. As illustrated by Equation 4, the total loss is composed of the loss due to the generation of the latent code as represented by λL.sub.code(c, h) and the reconstruction loss L.sub.reconstruction(X,X.sub.reconstructed) due to reconstruction of the data. Here, X represents the set of input data M11 and X.sub.r represents the set of reconstructed data M900. As illustrated by Equation 4, L.sub.total represents the sum of the code loss and the reconstruction loss.”)

25. 	As per claim 7, Funka-Lea in view of Li, and in view Dernoncourt, and further in view Chhabra discloses: The computer-implemented method of claim 5, wherein the latent code includes camera features for the corresponding view. (Chhabra, [0049], “In addition, in certain embodiments the set of latent code may be a dimensionally reduced binary representation of the set of input data. As an example, as illustrated in FIG. 2, the set of latent code for an image may be “1011,” where each digit indicates a different aspect of the image (e.g., shape, color, size).”)

26.	Claims 14, which is similar in scope to claims 5, and 6, thus rejected under the same rationale.

Response to Arguments

27.	Applicant’s arguments with respect to claims 1-20 filed 02/22/202 have been considered but are moot because, the Applicant submitted new amended claims. Accordingly, new grounds of rejection are set forth above. The new grounds of rejection conclusion have been necessitated by Applicant's amendments to the claims. 

Conclusion

28. 	Applicants amendment necessitated the new ground(s) of rejection presented in THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ABDERRAHIM MEROUAN whose telephone number is (571)270-5254.  The examiner can normally be reached on Monday to Friday 7:30 AM to 5:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kent Chang can be reached on 571-272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business 

/ABDERRAHIM MEROUAN/Primary Examiner, Art Unit 2619