DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Claim Objections

Claim 1-20 are objected to because of the following informalities:
Claim 1-20 recite “first CNN block”, “second CNN block”, “third CNN block”, “fourth CNN block” and “fifth CNN block”, please clarify in claim what each CNN block is, and what is the relation between each CNN block, and why the third CNN block is based on fifth CNN block, etc.
Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.


Claim 1-3, 8-13, 18-20 are rejected under 35 U.S.C. 103 as being unpatentable over SON U.S. Patent Application 20190188882 in view of Lu U.S. Patent Application 20180143966, and further in view of Mazo U.S. Patent Application 20180240235.
Regarding claim 1, SON discloses a computer-implemented method for generating computational representations for three-dimensional (3D) geometry shapes, the method comprising:
for each view included in a first plurality of views associated with a first 3D geometry, generating a view activation based on a first convolutional neural network (CNN) block (paragraph [0030]: The encoder may include a convolutional neural network (CNN) configured to generate a multi-channel feature from a single-channel 2D image; paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”; paragraph [0072]: in operation 201, the image processing apparatus extracts an input feature 103 from the input image 101 using the encoder 102; paragraph [0060]: a change in a viewpoint of an image based on a characteristic of an object or a target included in the image, and a rotation, a movement and a transformation of an object included in an image. For example, in response to an input of a user, an interaction is performed to rotate or distort an object in an image; see fig. 4 generating image from plurality of viewpoints);
generating a first shape embedding having a fixed size based on the first view activation and a second CNN block (paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”; paragraph [0083]: the input feature 301 and the second feature 304 have the same sizes. Also, a spatial size of the result image is equal to a spatial size of the input image; paragraph [0020]: The converting of the training input feature may include applying a weight to a first feature map… the training input feature combined with the first feature map to the second feature that may be suitable for an input of the decoder; paragraph [0030]: the decoder may include a CNN configured to generate a single-channel 2D image from a multi-channel feature); 
generating a first plurality of re-constructed views based on the first shape embedding (paragraph [0068]: A process of activating nodes from a hidden layer to an output layer is referred to as "decoding" or "reconstruction."; paragraph [0083]:  the input feature 301 and the 
performing one or more training operations on at least one of the first CNN block or the second CNN block based on the first plurality of views and the first plurality of re-constructed views to generate a trained encoder (paragraph [0023]: The training method may include training the encoder and the decoder so that the training result image may be substantially identical to the image generated by applying the interaction to the training input image; paragraph [0024]: The training input features may be extracted by the encoder from training input images, training result images may be generated by the decoder from the training input features, and the encoder and the decoder may be pre -trained so that the training input images are substantially identical to the training result images); and 
generating a second shape embedding having the fixed size based on a second 3D geometry and the trained encoder (paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”; paragraph [0083]:  the input feature 301 and the second feature 304 have the same sizes. Also, a spatial size of the result image is equal to a spatial size of the input image; paragraph [0060]: a change in a viewpoint of an image based on a characteristic of an object or a target included in the image, and a rotation, a movement and a transformation of an object included in an image. For example, in response to an input of a user, an interaction is performed to rotate or distort an object in an image; see fig. 4 generating image from plurality of viewpoints).
SON discloses all the features with respect to claim 1 as outlined above. However, SON fails to disclose aggregating the view activations to generate a first tiled activation; a first vector having a fixed size and a second vector having the fixed size explicitly. 
Lu discloses a first vector having a fixed size and a second vector having the fixed size (paragraph [0055]: Attention-based visual neural encoder-decoder models use a convolutional neural network (CNN) to encode an input image into feature vectors; Lu’s teaching of vectors 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction.
SON as modified by Lu discloses all the features with respect to claim 1 as outlined above. However, SON as modified by Lu fails to disclose aggregating the view activations to generate a first tiled activation.
Mazo discloses aggregating the view activations to generate a first tiled activation (paragraph [0019]: outputs of a forward direction and a backward direction of the bidirectional GRU (view activations) are concatenated and reshaped into dimensions corresponding to dimensions of feature map outputs of the plurality of sequential contracting components (tiled activation); paragraph [0110]: Single expanding component 512 outputs a segmentation mask 514 for the target 2D slice i 502. Segmentation mask 514 may represent a classification of each pixel of target slice 502, as belonging to the segmented region (tile), or not belonging to the segmented region; Mazo’s teaching of aggregating components tiles can be combined with SON and Lu’s device, such that shape embedding can be based on components tiled activation).
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Regarding claim 2, SON as modified by Lu and Mazo discloses the method of claim 1, wherein aggregating the view activations comprises concatenating the view activations along a single dimension (Mazo’s paragraph [0019]: outputs of a forward direction and a backward direction (single direction) of the bidirectional GRU are concatenated and reshaped into 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction; and combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Regarding claim 3, SON as modified by Lu and Mazo discloses the method of claim 1, wherein the second CNN block includes one or more 2D CNNs followed by a fully connected layer (SON’s paragraph [0030]: The encoder may include a convolutional neural network (CNN) configured to generate a multi-channel feature from a single-channel 2D image, and the decoder may include a CNN configured to generate a single-channel 2D image from a multi-channel feature; paragraph [0067]: A set of the encoder 102 and the decoder 105 (hereinafter, referred to as an "encoder 102-decoder 105 network") includes an input layer, a hidden layer and an output layer; Mazo’s paragraph [0101]: A successive convolution layer can then learn to assemble a more precise output based on this information). 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction; and combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Regarding claim 8, SON as modified by Lu and Mazo discloses the method of claim 1, wherein generating a first view activation included in the view activations occurs at least partially in parallel with generating a second view activation included in the view activations (SON’s paragraph [0103]: Many of the operations shown in FIG. 10 may be performed in parallel or 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction; and combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Regarding claim 9, SON as modified by Lu and Mazo discloses the method of claim 1, wherein the second shape embedding is associated with a 3D design that is automatically generated via a generative design flow (SON’s paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”; paragraph [0061]: the image processing apparatus processes an image interaction using a technique of estimating three-dimensional (3D) information of an image or converting a feature extracted from the image instead of reconstructing a 3D image; see fig. 4 generating image from plurality of viewpoints). 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction; and combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Regarding claim 10, SON as modified by Lu and Mazo discloses the method of claim 1, wherein the second shape embedding is associated with the second 3D geometry, and further comprising: 
generating a third shape embedding that includes a third vector having the fixed size based on the trained encoder and a third 3D geometry (SON’s paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or 
and performing one or more comparison operations between the second shape embedding and the third shape embedding when exploring a design space (SON’s paragraph [0083]: The image processing apparatus generates, using the CNN of the decoder, a result image from the second feature 304 that is to be suitable for the CNN of the decoder. A structure of the CNN of the encoder and a structure of the CNN of the decoder are symmetrical with each other, and the input feature 301 and the second feature 304 have the same sizes. Also, a spatial size of the result image is equal to a spatial size of the input image; paragraph [0113]: Examples of hardware components that may be used to perform the operations described in this application where appropriate include controllers, sensors, generators, drivers, memories, comparators). 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON’s to generate vectors as taught by Lu, to facilitate image reconstruction; and combine SON and Lu’s to concatenate image components as taught by Mazo, to perform image segmentation and reconstruction efficiently.

Claim 11 recites the functions of the method recited in claim 1 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 1 applies to the medium steps of claim 11.
Claim 12 recites the functions of the method recited in claim 2 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 2 applies to the medium steps of claim 12.
Claim 13 recites the functions of the method recited in claim 3 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 3 applies to the medium steps of claim 13.
Claim 18 recites the functions of the method recited in claim 8 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 8 applies to the medium steps of claim 18.
Claim 19 recites the functions of the method recited in claim 10 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 10 applies to the medium steps of claim 19.
Claim 20 recites the functions of the method recited in claim 1 as apparatus steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 1 applies to the apparatus steps of claim 20 (SON’s paragraph [0023]: a memory storing instructions; and one or more processors configured to execute the instructions).

Claim 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over SON U.S. Patent Application 20190188882 in view of Lu U.S. Patent Application 20180143966, in view of Mazo U.S. Patent Application 20180240235, and further in view of Kim U.S. Patent Application 20140139518.
Regarding claim 7, SON as modified by Lu and Mazo discloses all the features with respect to claim 1 as outlined above. However, SON as modified by Lu and Mazo fails to disclose for each virtual camera included in a plurality of virtual cameras, rendering the first 3D image based on the virtual camera to generate a different view included in the first plurality of views. 

Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON, Lu and Mazo’s to generate views from different viewpoints as taught by Kim, to provide an image allowing a user to have depth perception.

Claim 17 recites the functions of the method recited in claim 7 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 7 applies to the medium steps of claim 17.

Claim 4-6 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over SON U.S. Patent Application 20190188882 in view of Lu U.S. Patent Application 20180143966, in view of Mazo U.S. Patent Application 20180240235, and further in view of O`Shea U.S. Patent Application 20180314985.
Regarding claim 4, SON as modified by Lu and Mazo discloses generating a second plurality of re-constructed views based on the third CNN block and the second CNN block; and determining that the third CNN block and the second CNN block comprise the trained encoder based on the loss function; the first plurality of views; and the second plurality of re-constructed views (SON’s paragraph [0106]: the training apparatus trains the encoder 903 and decoder 907 so that the training result image 908 is identical to the image 909. The training apparatus performs training based on a loss function that is defined based on a difference between the training result image 908 and the image 909; paragraph [0020]: The converting of the training input feature may include applying a weight to a first feature map… the training input feature 
O`Shea discloses determining that the first CNN block is not trained based on a loss function; modifying one or more machine learning parameters associated with the first CNN block based on the loss function to generate a third CNN block (paragraph [0063]: updating the weight vectors /parameters in the encoder network 302 and/or the decoder network 304 may be based on the loss function 312, the measure of compression in the compact representation 306, the quality of the reconstruction of the signal, and/or other measures of performance).
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON, Lu and Mazo’s to update parameters based on loss function as taught by O`Shea, to train and deploy machine-learning networks efficiently.

Regarding claim 5, SON as modified by Lu, Mazo and O`Shea discloses the method of claim 1, wherein generating the first plurality of re-constructed views comprises: 
generating a decoded tiled activation based on the first shape embedding and a third CNN block (SON’s paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”; paragraph [0106]: the training apparatus trains the encoder 903 and decoder 907 so that the training result image 908 is identical to the image 909. The training apparatus performs training based on a loss function that is defined based on a difference between the training result image 908 and the image 909; paragraph [0030]: the decoder may include a CNN configured to generate a single-channel 2D 
partitioning the decoded tiled activation into a plurality of decoded view activations (Mazo’s paragraph [0019]: outputs of a forward direction and a backward direction of the bidirectional GRU (view activations) are concatenated and reshaped into dimensions corresponding to dimensions of feature map outputs of the plurality of sequential contracting components (tiled activation); paragraph [0110]: Single expanding component 512 outputs a segmentation mask 514 for the target 2D slice i 502. Segmentation mask 514 may represent a classification of each pixel of target slice 502, as belonging to the segmented region (tile), or not belonging to the segmented region; see SON’s fig. 4 generating image from plurality of viewpoints); and 
for each decoded view activation included in the plurality of decoded view activations, generating a different re-constructed view included in the first plurality of re-constructed views based on a fourth CNN block (SON’s paragraph [0068]: A process of activating nodes from a hidden layer to an output layer is referred to as "decoding" or "reconstruction."; paragraph [0030]: the decoder may include a CNN configured to generate a single-channel 2D image from a multi-channel feature; see fig. 4 generating image from plurality of viewpoints; O`Shea’s paragraph [0063]: updating the weight vectors /parameters in the encoder network 302 and/or the decoder network 304 may be based on the loss function 312, the measure of compression in the compact representation 306, the quality of the reconstruction of the signal, and/or other measures of performance).


Regarding claim 6, SON as modified by Lu, Mazo and O`Shea discloses the method of claim 5, further comprising modifying one or more machine learning parameters associated with a fifth CNN block based on a loss function to generate the third CNN block (SON’s paragraph [0079]: In machine learning, a CNN, which is a kind of neural networks, includes convolutional layers designed to perform a convolution operation; paragraph [0106]: the training apparatus trains the encoder 903 and decoder 907 so that the training result image 908 is identical to the image 909. The training apparatus performs training based on a loss function that is defined based on a difference between the training result image 908 and the image 909; O`Shea’s paragraph [0063]: updating the weight vectors /parameters in the encoder network 302 and/or the decoder network 304 may be based on the loss function 312, the measure of compression in the compact representation 306, the quality of the reconstruction of the signal, and/or other measures of performance).
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON, Lu and Mazo’s to update parameters based on loss function as taught by O`Shea, to train and deploy machine-learning networks efficiently.

Claim 14 recites the functions of the method recited in claim 4 as medium steps.  Accordingly, the mapping of the prior art to the corresponding functions of the method in claim 4 applies to the medium steps of claim 14.

Regarding claim 15, SON as modified by Lu, Mazo and O`Shea discloses the one or more non-transitory computer readable media of claim 14, wherein the loss function measures a 
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON, Lu and Mazo’s to update parameters based on loss function as taught by O`Shea, to train and deploy machine-learning networks efficiently.

Regarding claim 16, SON as modified by Lu, Mazo and O`Shea discloses the one or more non-transitory computer readable media of claim 11, wherein generating the first plurality of re-constructed views comprises: 
performing a first set of convolution operations on the first shape embedding to generate a decoded tiled activation (SON’s paragraph [0068]: A process of activating nodes from an input layer to a hidden layer is referred to as "encoding" or "embedding”. A process of activating nodes from a hidden layer to an output layer is referred to as "decoding" or "reconstruction.";  paragraph [0030]: the decoder may include a CNN configured to generate a single-channel 2D image from a multi-channel feature; Mazo’s paragraph [0019]: outputs of a forward direction and a backward direction of the bidirectional GRU (view activations) are concatenated and reshaped into dimensions corresponding to dimensions of feature map outputs of the plurality of sequential contracting components (tiled activation); paragraph [0110]: Single expanding component 512 outputs a segmentation mask 514 for the target 2D slice i 502. Segmentation 
partitioning the decoded tiled activation into a plurality of decoded view activations (Mazo’s paragraph [0019]: outputs of a forward direction and a backward direction of the bidirectional GRU (view activations) are concatenated and reshaped into dimensions corresponding to dimensions of feature map outputs of the plurality of sequential contracting components (tiled activation); paragraph [0110]: Single expanding component 512 outputs a segmentation mask 514 for the target 2D slice i 502. Segmentation mask 514 may represent a classification of each pixel of target slice 502, as belonging to the segmented region (tile), or not belonging to the segmented region; see SON’s fig. 4 generating image from plurality of viewpoints); and 
for each decoded view activation included in the plurality of decoded view activations, performing a second set of convolution operations on the decoded view activation to generate a different re-constructed view included in the first plurality of re-constructed views (SON’s paragraph [0068]: A process of activating nodes from a hidden layer to an output layer is referred to as "decoding" or "reconstruction."; paragraph [0060]: a change in a viewpoint of an image based on a characteristic of an object or a target included in the image, and a rotation, a movement and a transformation of an object included in an image. For example, in response to an input of a user, an interaction is performed to rotate or distort an object in an image; see fig. 4 generating image from plurality of viewpoints; paragraph [0079]: In machine learning, a CNN, which is a kind of neural networks, includes convolutional layers designed to perform a convolution operation. A convolutional layer of the CNN performs a convolution operation associated with an input using at least one kernel).
Therefore, it would be obvious to one of ordinary skill in the art at the time of the invention was made to combine SON, Lu and Mazo’s to update parameters based on loss function as taught by O`Shea, to train and deploy machine-learning networks efficiently.

Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Yi Yang whose telephone number is (571)272-9589.  The examiner can normally be reached on Monday-Friday 9:00 AM-6:00 PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Devona Faulk can be reached on 571-272-7515. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 

/YI YANG/
Examiner, Art Unit 2616