DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on February 17, 2021 has been considered by the examiner.

	Examiner has reviewed the specification and the claim invention of the present application. Examiner has completed the prior art reference search. However, the search results taken either singly or in combination does not fully teach the pending claims. Accordingly, the Application is in condition for allowance.

Allowable Subject Matter
Claims 1-20 are allowed.
The following is an examiner’s statement of reasons for allowance: 
Examiner has completed the prior art reference search and discovered the closest prior references: Gausebeck et al (U.S. Pub. 2019/0026956 A1), He et al (U.S. Pub. 2021/0312698 A1), Li et al (U.S. Pub. 2021/0287430 A1) and Guo et al ("Multi-view 3D object retrieval with deep embedding network." IEEE Transactions on Image Processing ( Volume: 25, Issue: 12, Dec. 2016) Page(s): 5526-5537, https://ieeexplore.ieee.org/document/7569026).

Independent claims 1, 9 and 18 are directed to an apparatus/a method for reconstructing a three-dimensional (3D) object. More specifically, claims 1, 9 and 18  require “obtain/obtaining, using a first neural network, mapping function weights of a mapping function of a second neural network, based on an image feature vector corresponding to a two-dimensional (2D) image of the 3D object; set/setting the mapping function of the second neural network, using the obtained mapping function weights; and based on sampled points of a canonical sampling domain, obtain/obtaining, using the second neural network of which the mapping function is set, 3D point coordinates and geodesic lifting coordinates of each of the sampled points in the 3D object corresponding to the 2D image, wherein the 3D point coordinates are first three dimensions of an embedding vector of a respective one of the sampled points, and the geodesic lifting coordinates are remaining dimensions of the embedding vector”. Furthermore, claim 18 recites additional limitations of “obtaining a first loss that is a first distance between the 3D point coordinates of a respective one of the sampled points and original point coordinates of the respective one of the sampled points; and updating parameters of the first neural network and the second neural network to minimize the obtained first loss”.

The prior art reference Guo et al discloses techniques of performing three-dimensional object retrieval by using two-dimensional image input and utilizing convolutional neural network to model mapping function. Abstract of Guo describes “In multi-view 3D object retrieval, each object is characterized by a group of 2D images captured from different views. Rather than using hand-crafted features, in this paper, we take advantage of the strong discriminative power of convolutional neural network to learn an effective 3D object representation tailored for this retrieval task. Specifically, we propose a deep embedding network jointly supervised by classification loss and triplet loss to map the high-dimensional image space into a low-dimensional feature space, where the Euclidean distance of features directly corresponds to the semantic similarity of images”. Pages 5528-5529 and overview section describe “As illustrated in Fig. 1, the proposed approach consists of two stages. In the first stage, for each 3D object with a group of 2D images, a set of deep features are extracted with the deep embedding network. In the second stage, we formulate the retrieval task as a set-to-set matching problem and the final retrieval results can be obtained by conducting set-to-set matching between the query and all the objects in the retrieval dataset”. More specifically, the left hand side of page 5529 and deep embedding network section describe “The goal of the embedding process is to learn a mapping function f (I) = Z, transforming the input image I into a point Z of a feature space where the Euclidean distance of two points directly corresponds to the semantic similarity of input images. To be specific, given two input images Ia and Ib, we define the similarity based on the squared Euclidean distance in the embedding feature space: … Therefore, we take advantage of CNN to model the mapping function f (I; Wc) = Z, where Wc denotes the weight parameters in the deep feature Z extraction process. Fig. 2 briefly demonstrates the training framework of the deep embedding network”.

At 2902, a device comprising a processor (e.g., user device 2102, user device 2302, user device 2502, and the like) captures 2D images of an object or an environment (e.g., using one or more cameras 1404).  At 2904, the device sends the 2D images to a server device (e.g., server device 2103), wherein based on reception of the 2D images, the server device employs one or more 3D-from-2D neural network models to derive 3D data for the 2D images (e.g., using 2D-from-3D processing module 1406), and generates a 3D reconstruction of the object or environment using the 2D images and the 3D data (e.g., using 3D model generation component 118).  At 2906, the device further receives the 3D reconstruction from the server device, and at 2908, the device renders the 3D reconstruction via a display of the device”.

The prior art reference He et al discloses techniques for utilizing an encoder-decoder architecture to learn a volumetric 3D representation of an object using digital images of the object from multiple viewpoints to render novel views of the object.  In 
addition, the disclosed systems can recurrently and concurrently aggregate the 
transformed feature representations to generate a 3D voxel representation of the object (Abstract). FIG. 2 of He shows the novel-view synthesis system synthesizing novel 
to render views of an 3D object (Paragraph [0059]). FIG. 3 of He describes a process of one or more implementations that the novel-view synthesis system performs to render views (e.g., novel views) of an object by learning a 3D voxel feature representation for the object and paragraph [0069] of He describes “the novel-view synthesis system 106 can extract feature representations from the sampled image patches.  In particular, as 
shown by FIG. 3, the novel-view synthesis system 106 can generate transformed 
feature representations that are view dependent from the sampled images patches 
in act 304 …The novel-view synthesis system 106 can then lift patch features 
from the different viewpoints.  In addition, the novel-view synthesis system 106 can utilize camera pose information from source images of each viewpoint to learn transformation kernels for each viewpoint.  Subsequently, the novel-view synthesis system 106 can apply the learned transformation kernels to the lifted feature representations to generate transformed feature representations that are view dependent”. 

The prior art reference Li et al discloses techniques to identify a shape or camera pose of a three-dimensional object from a two-dimensional image of the object (Abstract). More specifically, FIG. 5 of Li describes an image processing system that includes a trained reconstruction network and paragraph [0109] of Li describes “As illustrated there, trained reconstruction network 502 can receive an input image 504 of an object instance that is a 2D image and output an estimated three-dimensional (3D) mesh 506, an estimated texture flow array 508, and an estimate value for a camera pose 510, representing an estimation of 3D estimation of the object in a 3D virtual space and a camera pose for a camera in the 3D virtual space from which pose the input image might have been captured.  Trained reconstruction network 502 can use an encoder 512, a set of neural network weights 514, a 3D mesh shape template 516, a canonical semantic UV map 518, a shape decoder 520, a texture flow decoder 522, and a camera pose decoder 524.  Estimated three-dimensional (3D) mesh 506 might be represented as a set of vertex offsets to 3D mesh shape template 516.  A renderer 530 can receive an alternative camera pose and output a reconstructed 2D view of a 3D model from that alternative camera pose, as shown by the examples of FIG. 7”.

	However, the search results fail to show the obviousness of the claims as a whole. None of the prior art cited alone or in combination provides the motivation to teach the above limitations recited in claims 1, 9 and 18. Accordingly, claims 1, 9 and 18 are allowed.

Dependent claims 2-8 depend from independent claim 1, dependent claims 10-17 depend from independent claim 9, dependent claims 19-20 depend from independent claim 18. They are allowed at least due to their respective dependencies from an allowed claim.



Conclusion
	
	
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Xilin Guo whose telephone number is (571)272-5786. The examiner can normally be reached Monday - Friday 9:00 AM-5:30 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For 





/XILIN GUO/Primary Examiner, Art Unit 2616