DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Continued Examination
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 5/13/2022 has been entered. Claims 1-8 and 16-20 remain pending in the application. Applicant’s amendment to the claim 9-15 have overcome previous 103 rejection.  

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102 of this title, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains.  Patentability shall not be negated by the manner in which the invention was made.

Claim 1-6, 8 and 16-20 are rejected under 35 U.S.C. 103 as being unpatentable over Matsuzaka U.S. Patent Application 20100202699 in view of Lin U.S. Patent Application 20160350930, and further in view of Bogan U.S. Patent 10658005.
Regarding claim 1, Matsuzaka discloses a non-transitory computer readable medium comprising instructions that, when executed by at least one processor, cause a computing device to:
extract from one or more digital images, utilizing an neural network, a structure code comprising features corresponding to a geometric structure of the one or more digital images (paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image SI; paragraph [0073]: In Step S130 (FIG. 2), a shape model of the AAM is set. In particular, the face shape S that is specified by the positions of the characteristic points CP is modeled; paragraph [0090]: a technique using pattern matching, a technique using extraction of a skin-color area, a technique using learning data that is set by learning (for example, learning using a neural network, learning using boosting, learning using a support vector machine, or the like) using sample face images);
extract from the one or more digital images, utilizing the neural network, a texture code comprising features corresponding to a textural appearance of the one or more digital images (paragraph [0078]: In Step S140 (FIG. 2), a texture model of the AAM is set; paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA); 
receive a scene layout map defining regions for arranging different types of digital content by indicating boundaries between the regions (paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image Si. FIG. 4 illustrates the setting the characteristic points CP for a sample face image Si. The characteristic points CP are points that represent the positions of predetermined characteristic portions of a face image... predetermined positions of facial organs (eyebrows, eyes, a nose, and a mouth) and the face line are set as the characteristic portions); and 
generate a modified digital image by combining, according to the scene layout map, the structure code and the texture code to fit the regions of the scene layout map (paragraph [0024]: the position of the characteristic portion in the target image may be specified with high accuracy by using the shape model and the texture model; paragraph [0127]: In Step S660, the image transforming portion 241 restores the restored average shape image I(W(x;p)) to the shape of the target image OI; paragraph [0128]: calculating of the average shape image I(W(x;p)) (Step S620 shown in FIG. 17), the projecting into the texture eigenspace (Step S630), the expanding into the average shape s0 (Step S650), and the restoring to the shape of the target image OI (Step S660) are performed; paragraph [0067]: The display processing unit 310 is a display driver that displays a process menu, a message, an image, or the like on the display unit 150 by controlling the display unit 150).
Matsuzaka discloses all the features with respect to claim 1 as outlined above. However, Matsuzaka fails to disclose digital content corresponding to different labels, an encoder neural network and a generator neural network, and to generate digital content corresponding to the different labels of the scene layout map and arranged to fit the boundaries between the semantic regions. 
Lin discloses digital content corresponding to different labels, semantic regions for arranging different types of digital content by indicating boundaries between the semantic regions (paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed; paragraph [0087]: A global semantic and depth layout of a scene of an image is estimated through machine learning (block 1602), e.g., through a convolutional neural network; paragraph [0002]: Semantic labeling in images is utilized to assign labels to pixels in an image, such as to describe objects represented at least in part by the pixel, such as sky, ground, a building, and so on);
to generate digital content corresponding to the different labels of the scene layout map and arranged to fit the boundaries between the semantic regions (paragraph [0069]: FIG. 4, the merge calculation module 130 receives local semantic and depth layouts from the local determination module 128 and merges them using the global semantic and depth layouts 302, 304; see fig. 4).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use semantic map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation.
Matsuzaka as modified by Lin discloses all the features with respect to claim 1 as outlined above. However, Matsuzaka as modified by Lin fails to disclose an encoder neural network and a generator neural network.
Bogan discloses "The segmentation network 1006 may mask, highlight, label, or otherwise identify the body part" (col. 17 line 46-47); and further discloses an encoder neural network and a generator neural network (col. 12 line 49-53: training the autoencoder enables the encoder to represent the input (e.g., the image of a face or other base vector) in a more compact form (a lower dimensional representation of the face), which the decoder than attempts to reconstruct; col. 17 line 7-8: a given neural network may be configured with an encoder and decoder).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 2, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to receive the scene layout map by receiving the different labels for the different types of digital content depicted within the semantic regions and placing the boundaries between the semantic regions of different labels (Lin’s paragraph [0087]: A global semantic and depth layout of a scene of an image is estimated through machine learning (block 1602), e.g., through a convolutional neural network; paragraph [0002]: Semantic labeling in images is utilized to assign labels to pixels in an image, such as to describe objects represented at least in part by the pixel, such as sky, ground, a building, and so on; Matsuzaka’s paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image Si. FIG. 4 illustrates the setting the characteristic points CP for a sample face image Si. The characteristic points CP are points that represent the positions of predetermined characteristic portions of a face image... predetermined positions of facial organs (eyebrows, eyes, a nose, and a mouth) and the face line are set as the characteristic portions).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 3, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to:
extract the structure code from a certain digital image; and extract the texture code from the certain digital image (Matsuzaka’s paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image SI; paragraph [0073]: In Step S130 (FIG. 2), a shape model of the AAM is set. In particular, the face shape S that is specified by the positions of the characteristic points CP is modeled; paragraph [0078]: In Step S140 (FIG. 2), a texture model of the AAM is set). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 4, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to determine, for one or more additional digital images, one or more average structure codes for portions of the one or more additional digital images corresponding to a semantic region indicated by the scene layout map (Matsuzaka’s paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA; paragraph [0025]: the reference shape is an average shape that represents an average position of the characteristic portions of the plurality of sample face images; paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image Si. FIG. 4 illustrates the setting the characteristic points CP for a sample face image Si. The characteristic points CP are points that represent the positions of predetermined characteristic portions of a face image... predetermined positions of facial organs (eyebrows, eyes, a nose, and a mouth) and the face line are set as the characteristic portions; Lin’s paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 5, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 4, further comprising instructions that, when executed by the at least one processor, cause the computing device to replace a portion of the one or more digital image corresponding to the semantic region indicated by the scene layout map by replacing a portion of the structure code of the one or more digital image corresponding to the semantic region indicated by the scene layout map with an average structure code from among the one or more average structure codes (Matsuzaka’s paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA; paragraph [0025]: the reference shape is an average shape that represents an average position of the characteristic portions of the plurality of sample face images; Lin’s paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 6, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to:
generate a plurality of clusters of structure codes for portions of a sample set of digital images corresponding to a semantic region indicated by the scene layout map (Lin’s paragraph [0062]: The cluster number is assigned within each semantic class based on the geometric complexity of the semantic class. In addition, the semantic classes are identified that share similar geometric properties, such as ground and grass, and the segments within all the shared classes are clustered together. The clustered depth templates are then assigned to the shared semantic classes; paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed);
determine a representative structure code for a given cluster of the plurality of clusters; and replace a portion of the one or more digital images corresponding to the semantic region indicated by the scene layout map by replacing a portion of the structure code corresponding to the semantic region indicated by the scene layout map with the representative structure code (Matsuzaka’s paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA; paragraph [0025]: the reference shape is an average shape that represents an average position of the characteristic portions of the plurality of sample face images; Lin’s paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 8, Matsuzaka as modified by Lin and Bogan discloses the non-transitory computer readable medium of claim 1, further comprising instructions that, when executed by the at least one processor, cause the computing device to generate the modified digital image by combining the structure code and texture code to force digital content depicted within the one or more digital images to fit boundaries of the semantic regions indicated by the scene layout map (Matsuzaka’s paragraph [0024]: the position of the characteristic portion in the target image may be specified with high accuracy by using the shape model and the texture model; paragraph [0127]: In Step S660, the image transforming portion 241 restores the restored average shape image I(W(x;p)) to the shape of the target image OI; paragraph [0128]: calculating of the average shape image I(W(x;p)) (Step S620 shown in FIG. 17), the projecting into the texture eigenspace (Step S630), the expanding into the average shape s0 (Step S650), and the restoring to the shape of the target image OI (Step S660) are performed; Lin’s paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 16, Matsuzaka discloses a computer-implemented method for deep image manipulation, the computer-implemented method comprising: 
extracting from a first digital image, utilizing an encoder neural network, a first structure code comprising features corresponding to a geometric structure of the first digital image and a first texture code comprising features corresponding to a textural appearance of the first digital image (paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image SI; paragraph [0073]: In Step S130 (FIG. 2), a shape model of the AAM is set. In particular, the face shape S (structure code) that is specified by the positions of the characteristic points CP is modeled; paragraph [0078]: In Step S140 (FIG. 2), a texture model (texture code) of the AAM is set; paragraph [0090]: a technique using pattern matching, a technique using extraction of a skin-color area, a technique using learning data that is set by learning (for example, learning using a neural network, learning using boosting, learning using a support vector machine, or the like) using sample face images); 
extract from a second digital image, utilizing the encoder neural network a second structure code comprising features corresponding to a geometric structure of the second digital image and a second texture code comprising features corresponding to a textural appearance of the second digital image (paragraph [0090]: In Step S220 (FIG. 9), the face area detecting section 230 (FIG. 1) detects a predetermined area corresponding to a face image in the target image OI as a face area FA; paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA); 
receive a scene layout map defining regions for arranging different types of digital content by indicating boundaries between the regions within the second digital image (paragraph [0108]: In Step S410 of the update process (FIG. 15) for the disposition of the characteristic points CP, the image transforming portion 212 (FIG. 1) calculates an average shape image I(W(x;p)) from the target image OI; paragraph [0071]: In Step S120 (FIG. 2), the characteristic points CP are set for each sample face image Si. FIG. 4 illustrates the setting the characteristic points CP for a sample face image Si. The characteristic points CP are points that represent the positions of predetermined characteristic portions of a face image... predetermined positions of facial organs (eyebrows, eyes, a nose, and a mouth) and the face line are set as the characteristic portions); and 
generate, a modified digital image comprising digital content of the first digital image and digital content of the second digital image arranged according to the scene layout map by combining the first texture code and the second structure code to force the digital content depicted within the second digital image to fit within the boundaries indicated by the scene layout map (paragraph [0024]: the position of the characteristic portion in the target image may be specified with high accuracy by using the shape model and the texture model; paragraph [0127]: In Step S660, the image transforming portion 241 restores the restored average shape image I(W(x;p)) to the shape of the target image OI; paragraph [0128]: calculating of the average shape image I(W(x;p)) (Step S620 shown in FIG. 17), the projecting into the texture eigenspace (Step S630), the expanding into the average shape s0 (Step S650), and the restoring to the shape of the target image OI (Step S660) are performed).
Matsuzaka discloses all the features with respect to claim 9 as outlined above. However, Matsuzaka fails to disclose a swapping autoencoder comprising an encoder neural network and a generator neural network, and semantic regions for arranging different types of digital content corresponding to different labels. 
Lin discloses semantic regions for arranging different types of digital content corresponding to different labels by indicating boundaries between the semantic regions (paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed; paragraph [0087]: A global semantic and depth layout of a scene of an image is estimated through machine learning (block 1602), e.g., through a convolutional neural network; paragraph [0002]: Semantic labeling in images is utilized to assign labels to pixels in an image, such as to describe objects represented at least in part by the pixel, such as sky, ground, a building, and so on).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use semantic map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation.
Matsuzaka as modified by Lin discloses all the features with respect to claim 16 as outlined above. However, Matsuzaka as modified by Lin fails to disclose a swapping autoencoder comprising an encoder neural network and a generator neural network.
Bogan discloses a swapping autoencoder comprising an encoder neural network and a generator neural network (col. 12 line 49-53: training the autoencoder enables the encoder to represent the input (e.g., the image of a face or other base vector) in a more compact form (a lower dimensional representation of the face), which the decoder than attempts to reconstruct; col. 17 line 7-8: a given neural network may be configured with an encoder and decoder).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 17, Matsuzaka as modified by Lin and Bogan discloses the computer-implemented method of claim 16, further comprising providing the modified digital image for display on a client device (paragraph [0091]: The computing device 1702 may be, for example, a server of a service provider, a device associated with a client (e.g., a client device); paragraph [0033]: display device 108).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 18, Matsuzaka as modified by Lin and Bogan discloses the computer-implemented method of claim 16, wherein generating the modified digital image comprises modifying the second structure code using the scene layout map to replace one or more features of the second structure code with features corresponding to the scene layout map (Matsuzaka’s paragraph [0090]: In Step S220 (FIG. 9), the face area detecting section 230 (FIG. 1) detects a predetermined area corresponding to a face image in the target image OI as a face area FA; paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 19, Matsuzaka as modified by Lin and Bogan discloses the computer-implemented method of claim 16, wherein receiving the scene layout map comprises:
receiving user interaction to select a reference digital image (Matsuzaka’s paragraph [0094]: In Step S222 (FIG. 9), the model selection section 220 (FIG. 1) acquires the face image size of the target image OI and selects one shape model and one texture model... acquires the size of the set assumed reference area ABA as the face image size and selects a shape model and a texture model corresponding to a size closest to the size of the assumed reference area ABA); and
extracting the scene layout map from the reference digital image utilizing a semantic segmentation neural network (Lin’s paragraph [0043]: the depth and semantic segmentation module 114 may employ a variety of different machine learning 212 techniques, such as a convolutional neural network (CNN); paragraph [0087]: A global semantic and depth layout of a scene of an image is estimated through machine learning (block 1602), e.g., through a convolutional neural network).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Regarding claim 20, Matsuzaka as modified by Lin and Bogan discloses the computer-implemented method of claim 16, wherein the modified digital image depicts digital content corresponding to different labels fitted to different locations indicated by the scene layout map (Lin’s paragraph [0043]: the depth and semantic segmentation module 114 may employ a variety of different machine learning 212 techniques, such as a convolutional neural network (CNN); paragraph [0087]: A global semantic and depth layout of a scene of an image is estimated through machine learning (block 1602), e.g., through a convolutional neural network; paragraph [0002]: Semantic labeling in images is utilized to assign labels to pixels in an image, such as to describe objects represented at least in part by the pixel, such as sky, ground, a building, and so on).
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka’s to use depth map as taught by Lin, to obtain consistent and accurate semantic segmentation and depth estimation; and combine Matsuzaka and Lin’s to use autoencoder as taught by Bogan, to perform advanced forms of digital image processing efficiently.

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Matsuzaka U.S. Patent Application 20100202699 in view of Lin U.S. Patent Application 20160350930, in view of Bogan U.S. Patent 10658005, and further in view of Guleryuz U.S. Patent Application 20130265382.
Regarding claim 7, Matsuzaka as modified by Lin and Bogan discloses learning parameters for the encoder neural network and the generator neural network and a semantic region (Bogan’s col. 12 line 49-53: training the autoencoder enables the encoder to represent the input (e.g., the image of a face or other base vector) in a more compact form (a lower dimensional representation of the face), which the decoder than attempts to reconstruct; col. 17 line 7-8: a given neural network may be configured with an encoder and decoder; Lin’s paragraph [0058]: an image 702 is segmented into 350 super pixels. In the meanwhile, a closed-form edge technique is used to generate a semantic edge map 704 having strong boundaries that are enclosed). However, Matsuzaka as modified by Lin and Bogan fails to disclose determining a reconstruction loss only for portions of the one or more digital images unrelated to a region indicated by the scene layout map. 
Guleryuz discloses determining a reconstruction loss only for portions of the one or more digital images unrelated to a region indicated by the scene layout map (paragraph [0035]: identify the segmentation error spreads. The error spreads may be signaled to the rendering end for use in determining a drawing area, e.g., drawing area 408 of FIG. 4. When such error estimates are impractical or unavailable and/or when some performance loss is acceptable, fixed values appropriate for a given class of images may be used for the segmentation error estimates). 
Therefore, it would have been obvious before the effective filing date of the claimed invention to combine Matsuzaka, Lin and Bogan’s to use depth map as taught by Guleryuz, to determine boundary of objects in the image.

Allowable Subject Matter

Claim 9-15 are allowed. 
The following is an examiner’s statement of reasons for allowance:  
Claim 9 is about extracting from the first digital image, utilizing the encoder neural network, a first structure code comprising features corresponding to a geometric structure of the first digital image and a first texture code comprising features corresponding to a textural appearance of the first digital image; extract from the second digital image, utilizing the encoder neural network a second structure code comprising features corresponding to a geometric structure of the second digital image and a second texture code comprising features corresponding to a textural appearance of the second digital image; receive a scene layout map defining semantic regions for arranging different types of digital content corresponding to different labels by indicating boundaries between the semantic regions within the first digital image; and generate, utilizing the generator neural network, a modified digital image comprising digital content of the first digital image and digital content of the second digital image arranged according to the scene layout map by combining the first structure code and the second texture code to force the digital content depicted within the first digital image to fit the boundaries indicated by the scene layout map.
Matsuzaka, Lin, Bogan and Guleryuz combined cannot teach these features perfectly. These limitations when read in light of the rest of the limitations in the claim and the claims to which it depends make the claim allowable subject matter.

Claim 10-15 depend on claim 9, are allowed based on same reason as claim 9.

Response to Arguments

Applicant's arguments filed 5/12/2022, page 12 - 17, with respect to the rejection(s) of claim(s) 1 and 16 under 103, have been fully considered but they are not persuasive. (FP 7.37)

Applicant argues on page 12 that independent claim 16 recites allowable subject matter.

In reply, please amend claim 16 similarly to claim 9.

Applicant argues on page 13-17 that the cited art fails to teach or suggest generating a modified digital image by combining a structure code and a texture code according to a scene layout map.

In reply, the rejection is based on Matsuzaka, Lin and Bogan combined. Matsuzaka discloses generating a modified digital image by combining a structure code and a texture code according to the scene layout map (paragraph [0024]: the position of the characteristic portion in the target image may be specified with high accuracy by using the shape model and the texture model; paragraph [0127]: In Step S660, the image transforming portion 241 restores the restored average shape image I(W(x;p)) to the shape of the target image OI; paragraph [0128]: calculating of the average shape image I(W(x;p)) (Step S620 shown in FIG. 17), the projecting into the texture eigenspace (texture code) (Step S630), the expanding into the average shape s0 (Step S650), and the restoring to the shape (structure code) of the target image OI (Step S660) are performed; paragraph [0067]: The display processing unit 310 is a display driver that displays a process menu, a message, an image, or the like on the display unit 150 by controlling the display unit 150).
Lin discloses generating digital content corresponding to the different labels of the scene layout map and arranged to fit the boundaries between the semantic regions (paragraph [0069]: FIG. 4, the merge calculation module 130 receives local semantic and depth layouts from the local determination module 128 and merges them using the global semantic and depth layouts 302, 304; see fig. 4).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Yi Yang whose telephone number is (571)272-9589.  The examiner can normally be reached on Monday-Friday 9:00 AM-6:00 PM EST.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kee Tung can be reached on 571-272-7794. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). 
/YI YANG/
Examiner, Art Unit 2616