DETAILED ACTION


Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .


Claim Rejections - 35 USC § 103
	In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

	Claim(s) 1-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Risser (US 20180068463 A1), referred herein as Risser in view of Arroyo et al. (US 20210358095 A1), referred herein as Arroyo further in view of Muller et al. (US 20200035016 A1), referred herein as Muller.
Regarding Claim 1, Risser teaches a method comprising (Risser Abst: Systems and methods for providing convolutional neural network based image synthesis):
generating a plurality of novel images using an image generator implemented on a processor, the image generator receiving as input a plurality of neural features selected from a neural texture atlas, the image generator also receiving as input one or more position guides identifying position information for a plurality of input image pixels (Risser [0070] In particular, CNN-based image synthesis systems may perform texture synthesis in the following manner: A CNN image synthesis system receives an input source texture, S, and synthesizes an output texture, O. S and O are passed through a CNN such as VGG that generates feature maps for the activations of the first L convolutional layers of the CNN; [0097] In FIG. 4, images 401 and 402 are the style images and images 410-411 and 420-421 are the content images. Images 410 and 411 were generated without the use of pyramids and images 420 and 421 were generated with the use of pyramids. Images 410 and 420 show that pyramids blend coarse scale style features with content features better; [0098] A process for providing CNN-based image synthesis that performs style transfer using localized loss functions in accordance with an embodiment of the invention is shown in FIG. 5. In process 500, a source content image and a source style image are received (505, 510). The source content image includes the structures that are to be included in a synthesized image and the source style image includes a texture that is to be applied to the synthesized image);
Risser does not but Arroyo teaches
evaluating the plurality of novel images using an image discriminator implemented on the processor to determine a plurality of optimization values, the image discriminator comparing each of the plurality of novel images with a respective one of a corresponding plurality of input images (Arroyo [0089] The reconstructed input image 512 and the input image 406 can be received by the machine-learned discriminator model(s) 510 as inputs. The machine-learned discriminator model(s) 510 can evaluate a difference between the reconstructed input image 512 and the input image 406 to output a reconstructive discriminator output 514. The reconstructive discriminator output can be used as a training signal for an optimization function. The optimization function can evaluate the reconstructive discriminator output 414 and, based at least in part on the reconstructive discriminator output 414, modify values for one or more parameters of the machine-learned discriminator model(s) 510 and/or the machine-learned generator model(s) 508 based on the optimization function), 
Risser does not but Muller teaches
each of the plurality of novel images being generated from a respective camera pose relative to an object identical to that of the respective one of the plurality of input images (Muller [0073] a set of ANNs trained on one camera view can be easily reused from a different view or within a slightly modified scene; [0080] auxiliary data is fed to the ANN (such as a position of an object in a virtual 3D scene, a view direction in the virtual 3D scene, or surface or other material properties of objects in the virtual 3D scene));
Risser further teaches
updating the image generator and the neural features based on the optimization values (Risser [0098] Process 500 performs an optimization process using the localized content loss functions and/or localized style loss functions to cause the pixels in the synthesized image to form an image with a desired amount of content from the content source image and a desired amount of texture from the source style image (525)); and
Risser in view of Arroyo and in view of Muller further teaches
storing the updated neural features on a storage device, the updated neural features supporting the generation of a designated novel image from a designated camera pose different from any of the plurality of input images (Risser [0103] The process 700 then generates parametric models for each of the identified regions of the masks from the pixels associated with the regions (725) and may add the generated parametric model for each region to an array of matrices stored in memory; Muller [0073] a set of ANNs trained on one camera view can be easily reused from a different view or within a slightly modified scene).
Arroyo discloses a computer-implemented method to perform image-to-image translation, which is analogous to the present patent application. 
It would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to have modified Risser to incorporate teachings of Arroyo, and applying the one or more machine-learned generator models and discriminator, as taught by Arroyo into the systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models. 
Doing so would provide output images having the one or more desired values for the one or more characteristics in the systems and methods for deferred neural rendering for view extrapolation.
Muller discloses a method for piecewise-polynomial coupling layers for warp-predicting neural networks, which is analogous to the present patent application. 
It would have been obvious to one of ordinary skill in the art at the time before the effective filing date of the claimed invention to have modified Risser to incorporate teachings of Muller, and applying the simulation of light transport in a three-dimensional (3D) scene based on multi-dimensional input vector 130/230 and auxiliary data, as taught by Muller into the systems and methods for providing convolutional neural network based image synthesis using stable and controllable parametric models. 
Doing so would help ANN learn a better probability distribution more rapidly in the systems and methods for deferred neural rendering for view extrapolation.

Regarding Claim 2, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 1, and further teaches wherein image generator receives as input a plurality of background images, each of the plurality of background images associated with a respective one of the plurality of input images (Muller [0073] A third advantage is that the present solution offers trivial persistence across renders. For example, a set of ANNs trained on one camera view can be easily reused from a different view or within a slightly modified scene. Unlike conventional approaches, where the learned data structure requires explicit support of adaptation to new scenes, ANNs can be adapted by the same optimization procedure used in the initial training). The same motivation as Claim 1 applies here.

Regarding Claim 3, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 2, and further teaches the method further comprising: generating the plurality of background images by removing the object from the plurality of input images (Arroyo [0030] an input image may depict a vehicle (e.g., an automobile, helicopter, etc.) in the background. The feature characteristics of the output image can include changing the color of the vehicle (e.g., red to blue), the type of vehicle (e.g., changing the vehicle from a first type of automobile to a second type of automobile), or removing the vehicle). The same motivation as Claim 1 applies here.

Regarding Claim 4, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 1, and further teaches wherein the image generator receives as input a three- dimensional mesh representing the object in a three-dimensional coordinates space (Risser [0194] In computer graphics, a 3D model is typically “texture mapped”. For purposes of this discussion, “texture mapped,” means an image is wrapped over the surface of the 3D shape as shown in FIG. 24. 3D models typically contain UV coordinates at each vertex which define the 2D parameterization of the 3D surface. In FIG. 24, the left image displays the underlying geometry of the mesh 2401, the middle image shows the geometry with a texture mapped over the mesh 2402 and the image on the right shows what that texture 2403 looks like as a 2D mapping of a 3D surface).

Regarding Claim 5, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 4, and further teaches the method further comprising: generating the three-dimensional mesh based on the plurality of input images (Risser [0005] receive a source content image that includes desired content for a synthesized image, receive a source style image that includes a desired texture for the synthesized image; [0061] Image hybridization involves starting from a set of several source images within a category and mixing them together in a way that produces a new member of that category).

Regarding Claim 6, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 4, and further teaches wherein the one or more position guides includes a respective plurality of normal vectors for a designated one of the input images (Risser [0197] (1) Flow field over a 3D model is generated using its curvature properties along with user guidance. That flow field can then be projected as a 2D vector field in the parameterized texture space. This flow field typically contains both directional components as well as scale components along each axis. Rather than convolving the neural network along the image x and y axis unit vectors globally, each pixel now has its own local coordinate frame and scale), 
each of the normal vectors identifying a three-dimensional direction orthogonal to a surface of the three-dimensional mesh at a respective position in the three-dimensional coordinate space (Muller Drawing samples directly from the joint distribution is challenging due to the constrained nature of vertices; e.g., they have to reside on surfaces; [0068] It is noted that the normal of the intersected shape at x may also be included in auxiliary data 126 to aid ANN 250A/450A in learning distributions which correlate strongly with the local shading frame). The same motivation as Claim 1 applies here.

Regarding Claim 7, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 4, and further teaches wherein the one or more position guides includes a respective plurality of position values for a designated one of the input images, each of the plurality of position values identifying a position in the three-dimensional coordinate space associated with a respective pixel position in the designated input image (Risser [0103]  The process 700 applies the masks to each image and determines a region of the mask associated with each pixel in each of the images (715). The process 700 assigns each pixel to the region determined to be associated with the pixel (720). The process 700 then generates parametric models for each of the identified regions of the masks from the pixels associated with the regions (725) and may add the generated parametric model for each region to an array of matrices stored in memory). 

Regarding Claim 8, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 4, and further teaches wherein the one or more position guides includes an optical center associated with a designated one of the input images, the optical center indicating an approximate location in the three-dimensional coordinate space at which light rays that form the designated input image intersect (Muller [0021] When used in simulation of light transport, multi-dimensional output vector 138/238 may undergo a Monte Carlo integration. Moreover, in those implementations, coupling layers 240A, 240B, . . . 240L including respective ANNs 250A, 250B, . . . 250L generate the random numbers used by the light transport simulation, based on multi-dimensional input vector 130/230 and auxiliary data 126. It is noted, however, that a simulation of light transport performed using multi-dimensional output vector 138/238 may access additional data not used by or accessible to software code 110/210; [0067] While the density is defined over a 2D space, it is conditioned on position x and direction ω.sub.0). The same motivation as Claim 1 applies here.

Regarding Claim 9, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 4, and further teaches wherein generating the plurality of novel images comprises determining a plurality of uv-maps, each of the uv-maps providing a correspondence between pixel locations in a respective one of the novel images and locations in the neural texture atlas (Risser [0194] In computer graphics, a 3D model is typically “texture mapped”. For purposes of this discussion, “texture mapped,” means an image is wrapped over the surface of the 3D shape as shown in FIG. 24. 3D models typically contain UV coordinates at each vertex which define the 2D parameterization of the 3D surface. In FIG. 24, the left image displays the underlying geometry of the mesh 2401, the middle image shows the geometry with a texture mapped over the mesh 2402 and the image on the right shows what that texture 2403 looks like as a 2D mapping of a 3D surface. We refer to synthesizing texture maps as “on-model synthesis.”).

Regarding Claim 10, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 1, and further teaches the method further comprising: updating the image discriminator based on the optimization values (Arroyo [0089] the output can be backpropagated through the machine-learned model(s) (e.g., 508 and 510 ) to determine values associated with one or more parameters of the model(s) to be updated. The one or more parameters can be updated to reduce the difference evaluated by the optimization function (e.g., using an optimization procedure, such as a gradient descent algorithm)).The same motivation as Claim 1 applies here.

Regarding Claim 11, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 1, and further teaches wherein each neural feature includes a plurality of values (Risser [0184] using the dilated auto-encoder network described above, the encoder portion is run on all input images, their features are hybridized in the middle of the network using another process, and then this hybridized set of activation values are inverted by the decoder. Note that in both optimization and feedforward synthesis, the results of hybridizing deep features in the network can be passed up to shallow layers and then become further hybridized through another hybridization step).

Regarding Claim 12, Risser in view of Arroyo further in view of Muller teaches the method recited in claim 1, and further teaches wherein together the image generator and the image discriminator form a generative adversarial network (Arroyo [0020] the disclosure proposes an adversarial network (e.g., a generative adversarial network, etc.) utilizing one or more machine-learned generator models and one or more machine-learned discriminator models). The same motivation as Claim 1 applies here.

Regarding Claim 13, Risser in view of Arroyo further in view of Muller teaches a system (Risser Abst: Systems and methods for providing convolutional neural network based image synthesis).
The metes and bounds of the rest of the limitations substantially correspond to the claim as set forth in claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claims 14-19, Risser in view of Arroyo further in view of Muller teaches the system recited in claim 13. The metes and bounds of the claims substantially correspond to the claim as set forth in claims 2 and 6-9; thus, they are rejected on similar grounds and rationale as their corresponding limitations.

Regarding Claim 20, Risser in view of Arroyo further in view of Muller teaches one or more non-transitory computer readable media having instructions stored thereon for performing a method (Risser Abst: Systems and methods for providing convolutional neural network based image synthesis; [0068] the processing system software and/or firmware can be stored in any of a variety of non-transient computer readable media appropriate to a specific application).
The metes and bounds of the rest of the limitations substantially correspond to the claim as set forth in claim 1; thus they are rejected on similar grounds and rationale as their corresponding limitations.


Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Samantha (YUEHAN) WANG whose telephone number is (571)270-5011.  The examiner can normally be reached on Monday-Friday, 8am-5pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/Samantha (YUEHAN) WANG/
Primary Examiner
Art Unit 2611