DETAILED ACTION
Applicant’s amendment of July 19, 2022 overcomes the following:
Rejection of claims 7-2, 13-18, 25-54, and 61-102 based on 35 U.S.C. 112(b), pre-AIA  35 U.S.C. 112, second paragraph
Rejection of claims 1-6, 13-18, 25-30, 37-42, and 61-66 based on 35 U.S.C. 101
Applicant has amended claims 1-7, 13-18, 25-31, 37-43, 49, 55, 61-67, 73, 79, 85, 91, and 97. Claims 1-102 are pending.

Response to Arguments
Applicant’s arguments filed July 19, 2022 with respect to pending claims have been fully considered but are moot in view of the new ground(s) of rejection. The amended claims resulted in changes to the scope and contents; therefore, the grounds of rejection are modified accordingly. 
Regarding rejection of claims under 35 U.S.C. §112(b), Applicant asserts that the “… Office states that the usage of “substantially” in the independent claims renders the claims indefinite. Office Action at 3-11. Applicant respectfully disagrees, since one of ordinary skill in the art would know what was meant by substantially photorealistic… In particular, one of ordinary skill in the art would understand that “photorealistic” and “substantially photorealistic” images depict or seem to depict objects with the exactness of a photograph, but may contain imperfections that could be perceived as unrealistic…” (Remarks, Pg. 20).
Examiner respectfully disagrees.
One of ordinary skill in the art would not necessarily understand that “… “photorealistic” and “substantially photorealistic” images depict or seem to depict objects with the exactness of a photograph, but may contain imperfections that could be perceived as unrealistic”, as indicated above by Applicant. Additionally, Applicant’s arguments above with respect to “images depict or seem to depict objects with the exactness of a photograph, but may contain imperfections that could be perceived as unrealistic” are understood, but they are not relevant to the claims since these features are not being recited in the claims. Examiner suggests incorporating above features into all independent claims for full consideration. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.

Claims 1-6, 19-24, and 55-60 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 1, Ln. 4, recites the limitation “one or more substantially photorealistic images”. The term “substantially” in claim 1 is a relative term which renders the claim indefinite. The term “substantially” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, Par. [0023] of the specification indicates “a user may also have the option of only certain regions generated by the software, with some regions being substantially similar to what was provided in the input image”. However, the specification does not provide a standard for ascertaining the requisite degree of the term “substantially”, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention, which renders the claim indefinite. 
Claims 2-6 are rejected by virtue of dependent upon rejected base claim 1.
Claim 19, Ln. 2-3, recites the limitation “one or more substantially photorealistic images”. The term “substantially” in claim 19 is a relative term which renders the claim indefinite. The term “substantially” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, Par. [0023] of the specification indicates “a user may also have the option of only certain regions generated by the software, with some regions being substantially similar to what was provided in the input image”. However, the specification does not provide a standard for ascertaining the requisite degree of the term “substantially”, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention, which renders the claim indefinite. 
Claims 20-24 are rejected by virtue of dependent upon rejected base claim 19.
Claim 55, Ln. 2, recites the limitation “one or more substantially photorealistic images”. The term “substantially” in claim 55 is a relative term which renders the claim indefinite. The term “substantially” is not defined by the claim, the specification does not provide a standard for ascertaining the requisite degree, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention. For example, Par. [0023] of the specification indicates “a user may also have the option of only certain regions generated by the software, with some regions being substantially similar to what was provided in the input image”. However, the specification does not provide a standard for ascertaining the requisite degree of the term “substantially”, and one of ordinary skill in the art would not be reasonably apprised of the scope of the invention, which renders the claim indefinite. 
Claims 56-60 are rejected by virtue of dependent upon rejected base claim 55.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claim 97 is rejected under 35 USC 101 because the claimed invention is directed to non-statutory subject matter.
Based upon consideration of all of the relevant factors with respect to the claim as a whole, claim 1 is held to claim a signal per se, and are therefore rejected as ineligible subject matter under 35 U.S.C. § 101. The rationale for this finding is explained below: 
The broadest reasonable interpretation of the claims covers forms of non-transitory tangible media and transitory propagating signals per se in view of the ordinary and customary meaning of computer readable media (CRM). The specification is either silent or open-ended thus not limiting CRM to just non-transitory media. A claim drawn to such a computer readable medium that covers both transitory and non-transitory embodiments may be amended to narrow the claim to cover only statutory embodiments to avoid a rejection under 35 U.S.C. 101 by adding the limitation “non-transitory” to the claim. See guidelines for Subject Matter Eligibility of Computer readable Media, 1351 OG 212, Feb. 23, 2010. 

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-4, 7-10, 13-16, 19-22, 25-28, 31-34, 49-52, 61-64, 67-70, 73-76, 79-82, 85-88, and 97-100 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (US PG Publication No. 2019/0295302 A1), hereafter referred to as Fu, applicant cited prior art originally cited by the examiner during examination of parent application, in view of Dai et al. (US PG Publication No. 2017/0109625 A1), hereafter referred to as Dai.

Regarding claim 1, Fu discloses a non-transitory computer-readable medium having stored thereon a set of instructions which, when performed by one or more processors, cause the one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) at least: 
receive one or more semantic inputs (Par. [0034]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images; Par. [0041-43]: generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image;   receive one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs), as indicated above), for example); and 
generate one or more substantially photorealistic images [generate one or more photorealistic images] based, at least in part, on the one or more semantic inputs using one or more neural networks (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; and generate one or more photorealistic images based, at least in part, on the one or more semantic inputs using one or more neural networks (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. one or more neural networks) designed to impose semantic information on the generated images to provide realistic image generation results (i.e. generate one or more photorealistic images based  the one more semantic inputs using one or more neural networks) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example), but fails to teach the following as further recited in claim 1.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
Fu and Dai are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention) to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with semantic inputs indicating one or more regions of one or more images (as taught by Dai, Abstract, Par. [0002-3, 51-53, 62-63]) by performing semantic segmentation used as input to train neural networks in order to increase the accuracy of object identification in an image (Dai, Abstract, Par. [0002, 12, 56, 62]).

Regarding claim 2, claim 1 is incorporated and Fu discloses the non-transitory computer-readable medium, wherein the one or more semantic inputs include at least one region boundary with a semantic label indicating a type of image content to be generated within the at least one region boundary (Par. [0004]: fake image is a processor-generated image, where the processor may be a neural network, and a target segmentation… is a set of segments, e.g., sets of pixels or set of contours, that correspond to portions or landmarks, e.g., eyes, nose, mouth etc., of an image; Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0050-65]: generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… the segmentation map… and attributes vector… in the source domain; while y, s' and c' are its corresponding image, segmentation, and attributes in the target domain. The number of segmentation classes is denoted as ns as classes and the number of all the attributes is denoted as nc … the generator is implemented with a first neural network configured to generate a fake image based on a target segmentation. A fake image is a processor-generated image, where the processor may be a neural network, and a target segmentation… is a set of segments, e.g., sets of pixels or set of contours, that correspond to portions or landmarks, e.g., eyes, nose, mouth etc., of an image; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0085-92]: for generated images, the classification loss… enables the generator to transfer attribute-related contents from source to target domains… Based on extracted 68-point landmarks, semantic facial segmentations consisting of eyes, nose, mouth, skin, and background regions were generated; Par. [0117-127]: deep learning based adversarial network, referred to herein as Segmentation Guided Generative Adversarial Networks, which fully leverages semantic segmentation information to guide the image translation process. An example benefit of embodiments includes explicitly guiding the generator with pixel-wise and instance level segmentations, and, thus, further boosting the image quality. Another benefit is the semantic segmentation working well prior to the image generation, which is able to edit the image content. Thus, embodiments can simultaneously change facial attributes and achieve expression morphing without giving extra expression labels. In detail, the proposed SGGAN model may employ three networks, i.e., generator, discriminator, and segmentor. The generator takes as inputs, a given image, multiple attributes, and a target segmentation and generates a target image. The discriminator pushes the generated images towards a target domain distribution, and meanwhile, utilizes an auxiliary attribute classifier to enable the SGGAN to generate images with multiple attributes. The segmentor may impose semantic information on the generation process. This framework is trained using a large dataset of face images with attribute-level labels. Further, it is noted that embodiments may implement segmentations of any desired features, e.g., features of faces, clothes, street views, cityscapes, room layouts, room designs, and building designs, amongst other examples… the SCGAN implements a segmentor network constructed to impose spatial constraints on the generator. Results described below experimentally demonstrate that the SCGAN framework is capable of controlling the spatial contents of generated images such as face shape, facial expression, face orientation, and fashion layout by providing both visual and quantitative results… the SCGAN has both spatial and attribute-level controllability, with a segmentor network that guides the generator network with spatial information, and increases the model stability for convergence. In another embodiment, to avoid foreground-background mismatch, the generator network is configured to first, extract spatial information from an input segmentation, second, concatenate that latent vector to provide variations, and third, use attribute labels to synthesize attribute-specific contents in the generated image… the SCGAN has both spatial and attribute-level controllability, with a segmentor network that guides the generator network with spatial information, and increases the model stability for convergence… to avoid foreground-background mismatch, the generator network is configured to first, extract spatial information from an input segmentation, second, concatenate that latent vector to provide variations, and third, use attribute labels to synthesize attribute-specific contents in the generated image… a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions… the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity; Par. [0152]: landmark detector was applied to extract 68-point facial landmarks from real images. Facial landmarks separate facial attributes into different regions. By filling those regions with a semantic index, pixel-wisely, semantic segmentations are created; wherein the one or more semantic inputs include at least one region boundary with a semantic label indicating a type of image content to be generated within the at least one region boundary (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation (i.e. division, boundary, layout, etc.) information of an input image (i.e. the one or more semantic inputs), by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images of separate image attributes into different regions (i.e. the one or more semantic inputs include at least one region boundary) by filling those regions with a semantic index, including attribute labels (i.e. semantic labels) to synthesize attribute-specific contents in the generated image indicating a type of image content to be generated within each image region boundary, including a set of segments, such as, sets of pixels or set of contours, that correspond to portions or landmarks of an image (i.e. region boundaries), including segmentations of any desired features, including attribute-related contents (i.e. semantic labels indicating types of image content to be generated within the at least one region boundary), such as, features of faces, clothes, street views, cityscapes, room layouts, room designs, and building designs, amongst other examples, as indicated above), for example).

Regarding claim 3, claim 2 is incorporated and Fu discloses the non-transitory computer-readable medium, wherein the instructions when performed further cause the one or more processors to: 
generate a semantic layout including the at least one region boundary (Par. [0004]: fake image is a processor-generated image, where the processor may be a neural network, and a target segmentation… is a set of segments, e.g., sets of pixels or set of contours, that correspond to portions or landmarks, e.g., eyes, nose, mouth etc., of an image; Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0050-65]: generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… the segmentation map… and attributes vector… in the source domain; while y, s' and c' are its corresponding image, segmentation, and attributes in the target domain. The number of segmentation classes is denoted as ns as classes and the number of all the attributes is denoted as nc … the generator is implemented with a first neural network configured to generate a fake image based on a target segmentation. A fake image is a processor-generated image, where the processor may be a neural network, and a target segmentation… is a set of segments, e.g., sets of pixels or set of contours, that correspond to portions or landmarks, e.g., eyes, nose, mouth etc., of an image; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0092]: Based on extracted 68-point landmarks, semantic facial segmentations consisting of eyes, nose, mouth, skin, and background regions were generated; Par. [0152]: landmark detector was applied to extract 68-point facial landmarks from real images. Facial landmarks separate facial attributes into different regions. By filling those regions with a semantic index, pixel-wisely, semantic segmentations are created; generate a semantic layout including the at least one region boundary (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation (i.e. division, boundary, layout, etc.) information of an input image (i.e. generate a semantic layout), by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images of separate image attributes into different regions (i.e. including the at least one region boundary) by filling those regions with a semantic index, including attribute labels (i.e. semantic labels) to synthesize attribute-specific contents in the generated image indicating a type of image content to be generated within the at least one region boundary, including a set of segments, such as, sets of pixels or set of contours, that correspond to portions or landmarks of an image, as indicated above, for example), wherein the semantic label is modifiable to cause a different type of content to be generated within the region boundary (Par. [0039-46]: guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images, e.g., 273… In the training procedure 270, the segmentor 260 receives a target segmentation 271 and a generated image 274 produced by the generator 220. Then, based upon a segmentation loss, i.e., the difference between a segmentation determined from the generated image 274 and the target segmentation 271, the segmentor 260 is adjusted, e.g., weights in a neural network implementing the segmentor 260 are modified so the segmentor 260 produces segmentations that are closer to the target segmentation 271. The generator 240 is likewise adjusted based upon the segmentation loss to generate images that are closer to the target segmentation 271… weights of the neural network implementing the generator 220 are adjusted so as to improve the generator's 240 ability to generate images that are in accordance with the desired attributes 272 and target segmentation 271 while also being indistinguishable from real images; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0117-127]: detects faces from the input image and extracts corresponding semantic segmentations. Then, an image translation process uses trained models of a novel deep learning based adversarial network, referred to herein as Segmentation Guided Generative Adversarial Networks, which fully leverages semantic segmentation information to guide the image translation process. An example benefit of embodiments includes explicitly guiding the generator with pixel-wise and instance level segmentations, and, thus, further boosting the image quality. Another benefit is the semantic segmentation working well prior to the image generation, which is able to edit the image content. Thus, embodiments can simultaneously change facial attributes and achieve expression morphing without giving extra expression labels. In detail, the proposed SGGAN model may employ three networks, i.e., generator, discriminator, and segmentor. The generator takes as inputs, a given image, multiple attributes, and a target segmentation and generates a target image… a SCGAN that takes latent vectors, attribute labels, and semantic segmentations as inputs, and decouples the image generation into three dimensions. As such, embodiments of the SCGAN are capable of generating images with controlled spatial contents and attributes and generate target images with a large diversity; wherein the semantic label is modifiable to cause a different type of content to be generated within the region boundary (e.g. semantic segmentation (i.e. division, boundary, layout, etc.) information of an input image is generated by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images of separate image attributes into different regions (i.e. different type of content within each region boundary), by filling those regions with a semantic index, including attribute labels (i.e. semantic labels) to synthesize attribute-specific contents in the generated image indicating a type of image content to be generated within each image region boundary (i.e. semantic label is modifiable to cause a different type of content to be generated within the region boundary), including a set of segments, such as, sets of pixels or set of contours, that correspond to portions or landmarks of an image (i.e. region boundaries), as indicated above, to cause a different type of content to be generated within the region boundary by generating a target image that is based on a translated (i.e. modified, edited, styled, etc.) version of the input image and consistent with the input segmentation and attributes, including desired attributes of the image to be generated (i.e. semantic label is modifiable to cause a different type of content to be generated within the region boundary), as indicated above), for example).

Regarding claim 4, claim 3 is incorporated and Fu discloses the non-transitory computer-readable medium, wherein the instructions when performed further cause the one or more processors to: 
generate the type of image content within the region boundary using at least one generative adversarial network (GAN) including a generator and a discriminator (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; generate the type of image content within the region boundary using at least one generative adversarial network (GAN) including a generator and a discriminator (e.g. semantic segmentation (i.e. division, boundary, layout, etc.) information of an input image is generated by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images of separate image attributes into different regions, by filling those regions with a semantic index, including attribute labels (i.e. semantic labels) to synthesize attribute-specific contents in the generated image indicating a type of image content to be generated within each image region boundary (i.e. generate the type of image content within the region boundary), including a set of segments, such as, sets of pixels or set of contours, that correspond to portions or landmarks of an image (i.e. region boundaries), including segmentations of any desired features (i.e. types of image content to be generated within the region boundary), such as, features of faces, clothes, street views, cityscapes, room layouts, room designs, and building designs, amongst other examples, as indicated above, by using image generator system, which includes a generator, a discriminator (i.e. including a generator and a discriminator), and a segmentor, which are implemented with neural networks designed to impose semantic information on the generated images, such as a generative adversarial networks (GAN) (i.e. using at least one GAN), as indicated above), for example).

Regarding claim 7, is a corresponding apparatus claim rejected as applied to the computer readable medium claim 1 above.

Regarding claim 8, claim 7 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 2 above.

Regarding claim 9, claim 8 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 3 above.

Regarding claim 10, claim 9 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 4 above.

Regarding claim 13, Fu discloses a non-transitory machine-readable medium having stored thereon a set of instructions, which when performed by one or more processors, cause the one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) at least: 
receive one or more drawing inputs (Par. [0034]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images; Par. [0041-43]: generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image;   receive one or more drawing inputs (e.g. receive one or more drawing inputs, including generated corresponding semantic segmentation information (i.e. representation, drawing, etc.) of an input image (i.e. semantic inputs), by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more drawing inputs, semantic inputs, etc.), as indicated above), for example). The steps of the program further recited in claim 13 recite similar concept which corresponds to claim 1 when executed and are rejected as applied to computer readable medium claim 1 above.

Regarding claim 14, claim 13 is incorporated and the steps of the program further recited in claim 14 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 15, claim 14 is incorporated and the steps of the program further recited in claim 15 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 16, claim 15 is incorporated and the steps of the program further recited in claim 16 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 19, is a corresponding apparatus claim rejected as applied to the computer readable medium claim 13 above.

Regarding claim 20, claim 19 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 14 above.

Regarding claim 21, claim 20 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 15 above.

Regarding claim 22, claim 21 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 16 above.

Regarding claim 25, Fu discloses a non-transitory machine-readable medium having stored thereon a set of instructions, which performed by one or more processors, cause the one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) at least: 
receive one or more image inputs (Par. [0034]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images; Par. [0041-43]: generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image;   receive one or more image inputs (e.g. receive one or more image inputs, including generated corresponding semantic segmentation information of an input image (i.e. one or more image inputs), by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more drawing inputs, semantic inputs, image inputs, etc.), as indicated above), for example). The steps of the program further recited in claim 25 recite similar concept which corresponds to claim 1 when executed and are rejected as applied to computer readable medium claim 1 above.

Regarding claim 26, claim 25 is incorporated and the steps of the program further recited in claim 26 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 27, claim 26 is incorporated and the steps of the program further recited in claim 27 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 28, claim 27 is incorporated and the steps of the program further recited in claim 28 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 31, is a corresponding apparatus claim rejected as applied to the computer readable medium claim 25 above.

Regarding claim 32, claim 31 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 26 above.

Regarding claim 33, claim 32 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 27 above.

Regarding claim 34, claim 33 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 28 above.

Regarding claim 49, Fu discloses a system (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor) comprising: 
one or more servers to cause one or more photorealistic images to be generated using one or more neural networks and one or more semantic inputs, and further to cause the one or more photorealistic images to be displayed on one or more client devices (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0188-190]: FIG. 22 is a simplified block diagram of a computer-based system 2220 that may be used to implement any variety of the embodiments of the present invention described herein. The system 2220 comprises a bus 2223. The bus 2223 serves as an interconnect between the various components of the system 2220. Connected to the bus 2223 is an input/output device interface 2226 for connecting various input and output devices such as a keyboard, mouse, display, speakers, etc. to the system 2220. A central processing unit (CPU) 2222 is connected to the bus 2223 and provides for the execution of computer instructions implementing embodiments… FIG. 23 illustrates a computer network environment 2330 in which an embodiment of the present invention may be implemented. In the computer network environment 2330, the server 2331 is linked through the communications network 2332 to the clients 2333a-n. The environment 2330 may be used to allow the clients 2333a-n, alone or in combination with the server 2331, to execute any of the embodiments described herein. For non-limiting example, computer network environment 2330 provides cloud computing embodiments, software as a service (SAAS) embodiments; system comprises one or more servers to cause one or more photorealistic images to be generated using one or more neural networks and one or more semantic inputs, and further to cause the one or more photorealistic images to be displayed on one or more client devices (e.g. a system for training an image generator comprising a processor (i.e. computer, server, etc.) and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. using one or more neural networks) designed to impose semantic information on the generated images to provide realistic image generation results (i.e. generate one or more photorealistic images based  the one more semantic inputs using one or more neural networks) by ensuring that the generated/translated images are as realistic as the real images), including a computer network environment in which, the server is linked through the communications network to clients, which typically include a display, and the environment is used to allow the clients, alone or in combination with the server, to execute any of the embodiments, including causing generated images (i.e. the one or more photorealistic images) to be displayed on one or more client devices, as indicated above), for example, but fails to teach the following as further recited in claim 49.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 50, claim 49 is incorporated and the steps of the program further recited in claim 50 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 51, claim 50 is incorporated and the steps of the program further recited in claim 51 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 52, claim 51 is incorporated and the steps of the program further recited in claim 52 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 61, Fu discloses a non-transitory machine-readable medium having stored thereon a set of instructions, which performed by one or more processors, cause the one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) at least: 
receive one or more semantic inputs; 
cause the one or more semantic inputs to be provided to one or more neural networks; and 
cause the one or more neural networks to generate a photorealistic image based, at least in part, on the one or more semantic inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; receive one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs, as indicated above, for example); cause the one or more semantic inputs to be provided to one or more neural networks; and cause the one or more neural networks to generate a photorealistic image based, at least in part, on the one or more semantic inputs (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. cause the one or more semantic inputs to be provided to one or more neural networks) designed to impose semantic information (i.e. the one or more semantic inputs) on the generated images to provide realistic image generation results (i.e. generate a photorealistic image based, at least in part, on the one or more semantic inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example) , but fails to teach the following as further recited in claim 61.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 62, claim 61 is incorporated and the steps of the program further recited in claim 62 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 63, claim 62 is incorporated and the steps of the program further recited in claim 63 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 64, claim 63 is incorporated and the steps of the program further recited in claim 64 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 67, is a corresponding apparatus claim rejected as applied to the computer readable medium claim 61 above.

Regarding claim 68, claim 67 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 62 above.

Regarding claim 69, claim 68 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 63 above.

Regarding claim 70, claim 69 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 64 above.

Regarding claim 73, Fu discloses a system (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor) comprising: 
one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) receive one or more semantic inputs, wherein the one or more processors are to cause synthetic data representing one or more photorealistic images to be generated using one or more neural networks and the one or more semantic inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0188-190]: FIG. 22 is a simplified block diagram of a computer-based system 2220 that may be used to implement any variety of the embodiments of the present invention described herein. The system 2220 comprises a bus 2223. The bus 2223 serves as an interconnect between the various components of the system 2220. Connected to the bus 2223 is an input/output device interface 2226 for connecting various input and output devices such as a keyboard, mouse, display, speakers, etc. to the system 2220. A central processing unit (CPU) 2222 is connected to the bus 2223 and provides for the execution of computer instructions implementing embodiments… FIG. 23 illustrates a computer network environment 2330 in which an embodiment of the present invention may be implemented. In the computer network environment 2330, the server 2331 is linked through the communications network 2332 to the clients 2333a-n. The environment 2330 may be used to allow the clients 2333a-n, alone or in combination with the server 2331, to execute any of the embodiments described herein. For non-limiting example, computer network environment 2330 provides cloud computing embodiments, software as a service (SAAS) embodiments; receive one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs), as indicated above), for example), wherein the one or more processors are to cause synthetic data representing one or more photorealistic images to be generated using one or more neural networks and the one or more semantic inputs (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation (i.e. synthetization, construction, etc.) process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. one or more neural networks) designed to impose semantic information on the generated (i.e. synthesized, constructed, etc.) images to provide realistic image generation results (i.e. cause synthetic data representing one or more photorealistic images to be generated using one or more neural networks and the one or more semantic inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example) , but fails to teach the following as further recited in claim 73.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 74, claim 73 is incorporated and the steps further recited in claim 74 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 75, claim 74 is incorporated and the steps further recited in claim 75 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 76, claim 75 is incorporated and the steps further recited in claim 76 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 79, Fu discloses a method (Abstract: methods and systems for image generation through use of adversarial networks) comprising: 
receiving one or more semantic inputs; 
causing the one or more semantic inputs to be provided to one or more neural networks; and 
causing the one or more neural networks to generate one or more photorealistic images based, at least in part, on the one or more semantic inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; receiving one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs, as indicated above, for example); causing the one or more semantic inputs to be provided to one or more neural networks; and causing the one or more neural networks to generate one or more photorealistic images based, at least in part, on the one or more semantic inputs (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. cause the one or more semantic inputs to be provided to one or more neural networks) designed to impose semantic information (i.e. the one or more semantic inputs) on the generated images to provide realistic image generation results (i.e. generate one or more photorealistic images, at least in part, on the one or more semantic inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example) , but fails to teach the following as further recited in claim 79.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 80, claim 79 is incorporated and the steps further recited in claim 80 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 81, claim 80 is incorporated and the steps further recited in claim 81 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 82, claim 81 is incorporated and the steps further recited in claim 82 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 85, Fu discloses a processor (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor) comprising: 
one or more circuits to receive one or more semantic inputs, wherein the one or more circuits are to cause the one or more semantic inputs to be provided to one or more neural networks to generate a photorealistic image based, at least in part, on the one or more semantic inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein; one or more circuits (e.g. embodiments or aspects thereof are implemented in the form of hardware (i.e. one or more circuits), firmware, and software, as indicated above, for example) to receive one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs, as indicated above, for example); wherein the one or more circuits (e.g. implementations in the form of hardware (i.e. the one or more circuits), firmware, and software, as indicated above, for example) are to cause the one or more semantic inputs to be provided to one or more neural networks; and cause the one or more neural networks to generate a photorealistic image based, at least in part, on the one or more semantic inputs (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. cause the one or more semantic inputs to be provided to one or more neural networks) designed to impose semantic information (i.e. the one or more semantic inputs) on the generated images to provide realistic image generation results (i.e. generate a photorealistic image based, at least in part, on the one or more semantic inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example) , but fails to teach the following as further recited in claim 85.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 86, claim 85 is incorporated and the steps further recited in claim 86 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 87, claim 86 is incorporated and the steps further recited in claim 87 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 88, claim 87 is incorporated and the steps further recited in claim 88 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 97, Fu discloses a [non-transitory] machine-readable medium to store information representing one or more photorealistic images generated by a process (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) comprising: 
receiving one or more semantic inputs; 
causing the one or more semantic inputs to be provided to one or more neural networks; and 
causing the one or more neural networks to generate the information representing the one or more substantially photorealistic images using the one or more semantic inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; receiving one or more semantic inputs (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one or more semantic inputs, as indicated above, for example); causing the one or more semantic inputs to be provided to one or more neural networks; and causing the one or more neural networks to generate the information representing the one or more photorealistic images using the one or more semantic inputs (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. information, attributes, characteristics, etc.) of the input images (i.e. information representing the one or more photorealistic images using the one or more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. cause the one or more semantic inputs to be provided to one or more neural networks) designed to impose semantic information (i.e. the one or more semantic inputs) on the generated images to provide realistic image generation results (i.e. generate information representing the one or more photorealistic images using the one or more semantic inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example) , but fails to teach the following as further recited in claim 97.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.

Regarding claim 98, claim 97 is incorporated and the steps of the program further recited in claim 98 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 99, claim 98 is incorporated and the steps of the program further recited in claim 99 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 100, claim 99 is incorporated and the steps of the program further recited in claim 100 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Claims 91-95 are rejected under 35 U.S.C. 103 as being unpatentable over Fu, in view of view of Suzuki et al. (“Collaging on Internal Representations: An Intuitive Approach for Semantic Transfiguration”), referred to as Suzuki, Applicant cited prior art.
Regarding claim 91, Fu discloses a system (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor) comprising: 
one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) determine a type of one or more inputs and to cause a photorealistic image to be generated based, at least in part, on the type of the one or more inputs (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; determine a type of one or more inputs and to cause a photorealistic image to be generated based, at least in part, on the type of the one or more inputs (e.g. receive input images and generate corresponding semantic segmentation indicating types of features (i.e. attributes, characteristics, etc.) of the input images (i.e. determine a type of one or more inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks designed to impose semantic information (i.e. the one or more inputs) on the generated images to provide realistic image generation results (i.e. generated based, at least in part, on the type of the one or more inputs) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example).
However, Suzuki teaches one or more semantic inputs from one or more users (Pg. 2: in practical image editing tasks like those done with Photoshop, users may want to retain spatial freedom during the transformation; that is, users want fine control over the region of transformation… perform arbitrary, user-selected partial transformations (like those described above) over an arbitrary region of a user’s choice… a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 3: Given the input image of interest and the conditional generator G, the process begins by prompting the user to specify the region of an image to be edited, or the region of the image containing the object that the user wants to transform. The user can then apply spatial class translation and semantic transplantation on the selected region… With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… the strength of the user-selected class features are continuously increasing with the morphing strength… With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed; Pg. 4: algorithm also looks for the latent variable z that reconstructs the clip… from the reference image. Let R be the user-selected subregion of x on which to conduct the transformation, and let R0 be the region… from which the user wants to transplant the semantic information; one or more semantic inputs from one or more users (e.g. receive one or more user-selected features, including users wanting to retain spatial freedom during the transformation of images (i.e. receive one or more semantic inputs from one or more users), as indicated above).
Fu and Suzuki are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, it would have been obvious to someone of ordinary skill in the art before the effective filing date of the claimed invention) to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with one or more semantic inputs from one or more users (as taught by Suzuki, Pg. 2-4) to produce customized photorealistic images based on a set of photorealistic transformations (Suzuki, Abstract, Pg. 1-2 and 8).

Regarding claim 92, claim 91 is incorporated and the steps further recited in claim 92 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 93, claim 92 is incorporated and the steps further recited in claim 93 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 94, claim 93 is incorporated and the steps further recited in claim 94 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 95, claim 94 is incorporated and the steps further recited in claim 95 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Claims 5, 11, 17, 23, 29, 35, 37-41, 43-47, 53, 55-59, 65, 71, 77, 83, 89, and 101 are rejected under 35 U.S.C. 103 as being unpatentable over Fu in view of Dai, as applied to claim 1 above, and in further view of Suzuki.

Regarding claim 5, claim 4 is incorporated and the combination of Fu and Dai, as a whole teaches the non-transitory computer-readable medium (Fu, Par. [0004 and 0191]), wherein the GAN has at least one layer configured to propagate semantic information throughout other layers of the one or more neural networks (Fu, Par. [0034]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images; Par. [0054-56]: optimizer 310 sums up the loses 313, 309, 314, and 318 with weights, i.e., weights the losses differently, to determine a generator loss, which is used by the optimizer 310 to do back-propagation and update the parameters in a neural network implementing the generator 301… optimizing the discriminator 302 includes performing a back-propagation and updating the parameters, e.g., weights, in a neural network implementing the discriminator 302… optimizer 326 utilizes this loss 325 to as 303; Par. [0133-139]: generator network G 1340… is used to match a target mapping function. The generator 1340 takes three inputs which are z (latent vector 1351), c (attribute label 1352), and s (segmentation 1350). As shown in FIG. 13, the inputs 1350, 1351, and 1352 are fed into the generator 1340 one by one. First, the generator G 1340 takes s 1350 as input and extracts spatial information contained in s by several down-sampling convolutional layers (depicted as the block 1341). Next, the convolution result is concatenated, by the block 1342) with the latent code z 1351 after the latent code passes through the fully-connected neural network layer (FC) block 1353. In turn, the concatenation result is passed through residual up-sampling blocks 1343 and 1344 to construct the basic structure of the output image and attribute label c 1352 is fed into the generator 1340 through the expand block 1354 to guide the generator 1340 to generate attribute-specific images which share the similar basic image contents generated… During the training process, the three inputs (target segmentation 1406, target attributes 1405, and latent vector 1404) are fed into the generator 1401 to obtain a generated image 1407… fake adversarial loss term 1413, the fake classification loss 1414, and the fake segmentation loss 1409 are all provided to the optimizer 1410 to optimize the generator 1401. In an embodiment, the loses 1410, 1413, and 1414 are summed up with weights as the generator loss and the generator loss is used by the optimizer 1410 to do back-propagation and update parameters in a neural network implementing the generator 1401… the losses 1413, 1418, and 1420 are summed up as the discriminator loss and used by the optimizer 1421 to do back-propagation and update parameters in a neural network implementing the discriminator 1402… the optimizer 1425 performs a back-propagation and updates parameters in a neural network implementing the segmentor 1403; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; wherein the GAN has at least one layer configured to propagate semantic information throughout other layers of the one or more neural networks (e.g. Segmentation Guided Generative Adversarial Networks (SGGAN) model leverages semantic segmentation information to provide spatial constraints for the image translation task, including a generator, a discriminator, and a segmentor implemented with neural networks, including respective network layers, designed to impose semantic information on the generated images (i.e. propagate semantic information throughout the one or more neural networks) to provide realistic image generation results, including determining a generator loss, which is used by the optimizer to do back-propagation (i.e. communication, passing, feeding, transferring, etc.) and update the parameters in a neural network implementing the generator, performing a back-propagation and updating the parameters, e.g., weights, in a neural network implementing the discriminator, performing back-propagation and update the parameters in a neural network implementing the segmentor (i.e. GAN has at least one layer configured to propagate semantic information throughout other layers of the one or more neural networks), as indicated above), for example). Additionally, Fu further discloses that neural network architecture comprises instance normalization (IN) and Batch Normalization processes (Par. [0094]: Table 1 below illustrates the network architecture for the embodiments of the present invention implemented… In Table 1… IN refers to instance normalization; Par. [0154]: synthesis provides… images associated with attribute labels, caption, and semantic segmentation… Batch normalization [Ioffe et al., "Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift," In International Conference on Machine Learning, 448-456 (2015)] in both the generator and the segmentor was replaced with instance normalization [Ulyanov et al., "Instance Normalization: The Missing Ingredient for Fast Stylization), but does fails to teach the following as further recited in claim 5.  
However, Suzuki, teaches the GAN has at least one spatially-adaptive normalization layer configured to propagate semantic information throughout other layers of the one or more neural networks (Pg. 1, Abstract: CNN-based image editing method that allows the user to change the semantic information of an image over a user-specified region. Our method makes this possible by combining the idea of manifold projection with spatial conditional batch normalization (sCBN), a version of conditional batch normalization with userspecifiable spatial weight maps. With sCBN and manifold projection, our method lets the user perform (1) spatial class translation that changes the class of an object over an arbitrary region of user’s choice, and (2) semantic transplantation that transplants semantic information contained in an arbitrary region of the reference image to an arbitrary region in the target image; Pg. 1, Par. 1-2: deep generative models like generative adversarial networks (GANs) [10] and variational autoencoders (VAEs) [20] make possible the unsupervised learning of rich latent semantic information from images… Image conditional GANs [24, 40, 17] based on encoderdecoder architectures have been popular both for their convenient implemention in end-to-end differentiable ML frameworks, and their uncanny ability to produce photo- realistic images; Pg. 2, Par. 3: CNN-based image editing method that grants the user this very freedom. With our method, the user can transform a user-chosen part of image in a copy-paste fashion–and the user can do this all the while preserving semantic consistency. More precisely, we present a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 2, Par. 7-8: Class-conditional GAN [29, 26, 43, 2] is a framework designed to learn an invariant latent representation among various classes, and it is capable of generating diverse images from a same latent code z by changing class embedding (Figure 2). The work of [26, 2], in particular, succeeded in producing an impressive results by interpolating the parameters of conditional batch normalization layer, which was first introduced in [31, 5]. Conditional batch normalization (CBN) is mechanism that learns conditional information by separately learning condition-specific scaling parameter and shifting parameter for batch normalization. Our method extends the technique used in [26] by restricting the region of interpolation to a region that corresponds to the region of interest in the pixel space. We will refer to our approach spatial conditional batch normalization (sCBN). Unlike the manipulation done in style transfer [12], we introduce the conditional information at multiple levels in the network, depending on the style preference of the user. As we will show, sCBN in the lower layers transforms global features, and sCBN at upper layers transforms local features… Semantic transformation. In order to grant the user with wide freedom of semantic transformation, there has to be some mechanism to finely adjust the user-suggested transformation so that the final product becomes natural; Pg. 3, Par. 3-4: Spatial Class-translation With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… Spatial Semantic Transplantation With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed. Our method first prompts the user to specify the region in the target image containing the object of interest, along with the reference image of equal size. The user will be also asked to specify the region in the reference image that contains the semantic information to be transplanted. The method then automatically transplants the semantic information of the specified region of the reference image into the target image; Pg. 4, Par. 2-5 and Pg. 5: spatial class translation Our method functions on a trained conditional generator G, paired with the discriminator D with which G was trained. Upon receiving the region of interest x clipped from the target image and the class c of the target object contained in x, the algorithm begins by looking for a latent variable z such that G(z; c) will be close to x in the feature space of D (Manifold Projection step). The class c can either be specified by the user or by a pre-trained classifier. Suppose that the user wants to partially translate a region R in x to a class c′, and let Vℓ be the set of features in ℓ-th conditional batch normalization(CBN) layers that correspond to R in the pixel space. Our method then simply substitutes the parameters governing the shift and mean parameters of Vℓ with those of c′ (Figure 6). This will result in a modification of G… in which the CBN parameters of Vℓ exclusively carry the style information of the class c′. A transformed image can be constructed by applying this modified G… to z… our spatial editing method is applicable to any generative model (e.g., GAN, VAE) that is equipped with a machanism to iteratively incorporate class information during its image generation process. We will next elaborate on the design of our sCBN and spatial semantic implantation, along with the other details omitted in the brief description above… Spatial conditional batch normalization (sCBN) is the core of our spatial class translation. As can be inferred from our naming, sCBN is based on batch normalization (BN) [16], a technique developed for the purpose of reducing the internal covariance shift to accelerate the training of neural network. More precisely, we will borrow our idea from conditional batch normalization (CBN) [8, 5], a variant of BN that incorporates the class specific semantic information in the parameters for BN. Given a set of batches sampled each from a single class, the conditional batch normalization [8, 5] works by modulating the set of intermediate features produced from each batch of inputs so that it follow a normal distribution with mean and variance that are specific to the corresponding class. Let us fix the layer ℓ, and let Fk,h,w represent the feature of ℓ-th layer at channel k, height location h, and width location w. Given a batch {Fi,k,h,w} of Fk,h,w s generated from class c, the CBN at layer ℓ then transforms Fi,k,h,w… In our implementation, we replaced CBN at each layer with sCBN… After training the encoder, one can produce the reconstruction of x by applying G to z = E(x). In the reconstructed image, however, semantically independent objects are often dis-aligned. We therefore calibrate z by backpropagating the loss L. After some rounds of calibration, we can use the resulting z for the image transformation… instead of calibrating the latent variable z by backpropagating L through G, we will calibrate ζ by backpropagating L through G and B; Pg. 6, Par. 1-3: A and B are updated through the backpropagation from ζj. The second term makes sure that z can be reconstructed from ζ … generator used in our study is a ResNet-based generator trained as part of a conditional DCGAN. Each residua l block in our generator contains the conditional batch normalization (CBN) layer. At the time of inference, these CBN layers are replaced by the aforementioned sCBN layers that are tailored to the user’s preference. We base our architectures on those used in previous work [25, 26], and used the pre-trained model from [26]; Pg. 8, Par. 6: image transformation method that allows the user to translate the class of an object and transplant semantic features over a user-specified pixel region of the image. Indeed, there is still much room left for the exploration of the semantic information contained in the intermediate feature spaces of CNNs. We were, however, able to show that we can manipulate this information in a somewhat intuitive manner and produce customized photorealistic images; Pg. 11, Par. 9: we conducted a set of automatic spatial class translations. For each one the selected images, we (1) used a pre-trained model to extract the region of the object to be transformed (dog/cat), (2) conducted the manifold projection to obtain the z, (3) passed z to the generator with the class map corresponding to the segmented region, and (4) conducted a post-processing over the segmented region. For the semantic segmentation, we used a TensorFlow implementation of DeepLab v3 Xception model trained on MS COCO dataset; the GAN has at least one spatially-adaptive normalization layer configured to propagate semantic information throughout other layers of the one or more neural networks (e.g. image generative model which uses a neural network, such as a Generative Adversarial Network (GAN), including spatial conditional (i.e. adaptive, instant, etc.) batch normalization (sCBN) layers (i.e. at least one spatially-adaptive normalization layer), that is equipped with a mechanism to iteratively incorporate (i.e. propagate, communicate, pass, feed, transfer, etc.) class (i.e. feature, attribute, label, etc.) information during its image generation, including calculating a loss (i.e. error, difference, etc.) between semantically independent objects, for example, and backpropagating the loss throughout other layers of the one or more neural networks to update network through the backpropagation, including semantic features (i.e. class, attribute, label, etc.) of selected object regions (i.e. propagate semantic information), which are segmented/extracted in a reference (i.e. source, input, etc.) image, corresponding to object(s) in a target image to be transformed (i.e. an image to be generated), based on the set of features, Vℓ, in ℓ-th (first, second, third… Nth) conditional batch normalization (CBN) layers that correspond to a region R in the pixel space, including a number of spatial conditional batch normalization (sCBN) layers (i.e. a spatially-adaptive normalization layer configured to propagate semantic information from the semantic layout throughout other layers of the neural network), as indicated above), for example).
Fu, Dai, and Suzuki are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, the combined teachings of Fu, Dai, and Suzuki, as a whole, would have rendered obvious the invention recited in claim 5 with a reasonable expectation of success in order to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with the GAN has at least one spatially-adaptive normalization layer configured to propagate semantic information throughout other layers of the one or more neural networks (as taught by Suzuki, Abstract, Pg. 1-6 and 11) to produce customized photorealistic images based on a set of photorealistic transformations (Suzuki, Abstract, Pg. 1-2 and 8).

Regarding claim 11, claim 10 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 5 above.

Regarding claim 17, claim 16 is incorporated and the steps of the program further recited in claim 17 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 23, claim 22 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 17 above.

Regarding claim 29, claim 28 is incorporated and the steps of the program further recited in claim 29 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 35, claim 34 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 29 above.

Regarding claim 37, Fu discloses a non-transitory machine-readable medium having stored thereon a set of instructions, which performed by one or more processors, cause the one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) at least: 
receive one or more features; and 
generate one or more substantially photorealistic images [one or more photorealistic images] based, at least in part, on the one more features using one or more neural networks (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; receive one or more features (e.g. receive one or more semantic inputs, including generated corresponding semantic segmentation information (i.e. features, attributes, characteristics, etc.) of an input image, by receiving input images and generating a corresponding segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. receive one or more features, as indicated above, for example); and generate one or more photorealistic images based, at least in part, on the one more features using one or more neural networks (e.g. receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. the one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation (i.e. synthetization, construction, etc.) process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. one or more neural networks) designed to impose semantic information on the generated (i.e. synthesized, constructed, etc.) images to provide realistic image generation results (i.e. generate one or more photorealistic images based, at least in part, on the one more features using one or more neural networks) by ensuring that the generated/translated images are as realistic as the real images, as indicated above), for example), but fails to teach the following as further recited in claim 37.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.
The combined teachings above fail to teach the following as further recited in claim 37.
However, Suzuki teaches receive one or more user-selected features (Pg. 2: in practical image editing tasks like those done with Photoshop, users may want to retain spatial freedom during the transformation; that is, users want fine control over the region of transformation… perform arbitrary, user-selected partial transformations (like those described above) over an arbitrary region of a user’s choice… a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 3: Given the input image of interest and the conditional generator G, the process begins by prompting the user to specify the region of an image to be edited, or the region of the image containing the object that the user wants to transform. The user can then apply spatial class translation and semantic transplantation on the selected region… With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… the strength of the user-selected class features are continuously increasing with the morphing strength… With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed; Pg. 4: algorithm also looks for the latent variable z that reconstructs the clip… from the reference image. Let R be the user-selected subregion of x on which to conduct the transformation, and let R0 be the region… from which the user wants to transplant the semantic information; receive one or more user-selected features (e.g. receive one or more user-selected features, including user-selected regions, such as user-selected region of interest (ROI), user-selected subregion, etc., on which to conduct image transformation, including semantic transplantation that transplants a semantic feature of a user-selected region in an arbitrary image to a region of interest in the target image, as indicated above).
Fu, Dai, and Suzuki are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, the combined teachings of Fu, Dai, and Suzuki, as a whole, would have rendered obvious the invention recited in claim 37 with a reasonable expectation of success in order to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with receiving one or more user-selected features (as taught by Suzuki, Abstract, Pg. 2-4) to produce customized photorealistic images based on a set of photorealistic transformations (Suzuki, Abstract, Pg. 1-2 and 8).

Regarding claim 38, claim 37 is incorporated and the steps of the program further recited in claim 38 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 39, claim 38 is incorporated and the steps of the program further recited in claim 39 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 40, claim 39 is incorporated and the steps of the program further recited in claim 40 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 41, claim 40 is incorporated and the steps of the program further recited in claim 41 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 43, is a corresponding apparatus claim rejected as applied to the computer readable medium claim 37 above.

Regarding claim 44, claim 43 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 38 above.

Regarding claim 45, claim 44 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 39 above.

Regarding claim 46, claim 45 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 40 above.

Regarding claim 47, claim 46 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 41 above.

Regarding claim 53, claim 52 is incorporated and the steps of further recited in claim 53 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 55, Fu discloses a device (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor) comprising: 
one or more processors to (Par. [0004]: a system for training an image generator… the system comprises a processor and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to cause the system to provide a generator, discriminator, and segmentor; Par. [0191]: Embodiments or aspects thereof may be implemented in the form of hardware, firmware, or software… the software may be stored on any non-transient computer readable medium that is configured to enable a processor to load the software or subsets of instructions thereof. The processor then executes the instructions and is configured to operate or cause an apparatus to operate in a manner as described herein) receive one or more semantic inputs and to cause one or more substantially photorealistic images [one or more photorealistic images] to be generated using one or more neural networks and the one or more semantic inputs, and further to cause the one or more substantially photorealistic images to be displayed on the device (Par. [0034-39]: Segmentation Guided Generative Adversarial Network (SGGAN), which leverages semantic segmentation to improve image generation performance further and provide spatial mapping… a segmentor implemented with a neural network that is designed to impose semantic information on the generated images… generative adversarial networks (GAN) [Goodfellow et al., Generative adversarial nets, In NIPS, 2014] have emerged as a powerful tool for generative tasks, and significantly thrive in the field of deep generative models. Because GAN has the potential to provide realistic image generation results and alleviate the deficiency of training data… Segmentation Guided Generative Adversarial Network (SGGAN), which fully leverages semantic segmentation information to guide the image generation (e.g., translation) process… embodiments explicitly guide the generator with pixel-level semantic segmentations and, thus, further boost the quality of generated images. Further, the target segmentation employed in embodiments works as a strong prior, i.e., provides knowledge that stems from previous experience, for the image generator, which is able to use this prior knowledge to edit the spatial content and align the face image to the target segmentation; Par. [0040-44]: given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes shown… the proposed SGGAN framework comprises three networks… respectively, a generator network 220, a discriminator network 240, and a segmentor network 260… The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228… segmentor network 260 implemented with the blocks 261-265 is configured to receive an input image 266a and/or 266b and generate a corresponding segmentation 267a and/or 267b, respectively, indicating features of the input images 266a and/or 266b. The segmentor 260 imposes semantic information on the image generation process… Based on the segmentor network S, the proposed SGGAN, e.g., the network depicted in FIG. 2D, comprises three networks, a segmentor, a generator, and a discriminator. The proposed SGGAN utilizes semantic segmentations as strong regulations and control signals in multi-domain image-to-image translation… a training procedure 270 for training the segmentor 260, generator 220, and discriminator 240. During training, estimated segmentations from the segmentor 260 are compared with their ground-truth values, which provides gradient information to optimize the generator 220. This optimization tends to teach the generator 220 to impose the spatial constraints indicated in an input segmentation 271 on the translated images, e.g., 274. During the training 270, the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images; Par. [0048-51]: image generation using the training methods described herein and resulting trained generator can spatially control the generation process, and provide interpretable results… Segmentation Guided Generative Adversarial Networks (SGGAN) model which leverages semantic segmentation information to provide spatial constraints for the image translation task…  a segmentor network that is particularly designed to impose the target spatial guidance on the generator… a process of training an image generator, i.e., an image generator system, comprising a generator 301, discriminator 302, and segmentor 303… The generator 301 is configured to receive three inputs, an input image (source image) 304, a target segmentation 305, and a vector of target attributes 306. A goal of the training process is to configure the generator 301 to translate the input image 304 into a generated image (fake image) 307, which complies with the target segmentation 305 and attribute labels 306… During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307; Par. [0063-77]: In order to guide the generator by the target segmentation information, an additional network is built which takes an image as input and generates the image's corresponding semantic segmentation. This network is referred to as the segmentor network S which is trained together with the GAN framework. When training with the real data pairs (x,s), S learns to estimate segmentation correctly… such an embodiment leverages semantic segmentation information in GAN based image translation tasks and also builds a segmentor which is trained together with the GAN framework to provide guidance in image translation… images and resulting segmentations that may be used in embodiments and/or generated by the segmentor S. In FIG. 4, the image 440 is depicted with the dotted lines 441 showing facial landmarks extracted from the image 440. The segmentation 442 is a landmark-based semantic segmentation. The image 443 is a real image sample that may be provided to a segmentor S implemented according to an embodiment to generate the segmentation 444… extracted landmarks 441 are processed to generate a pixel-wised semantic segmentation 442 where each pixel in the input image 440 is automatically classified into classes of eyes, eyebrow, nose, lips, skin and background according to landmark information. During training of the segmentor S, S takes a real image sample 443 as input and generates an estimated segmentation 444… the segmentor S is optimized by minimizing the difference between the landmark base segmentation 442 and segmentor generated segmentation 444. For instance, based upon differences between the landmark-based segmentation 442 and segmentor S generated segmentation 444, weights of the network implementing the segmentor S may be varied. As shown in FIG. 4, the similarity between the landmark-based segmentation 442 and segmentor generated segmentation 444 reveals that a segmentor network, implemented according to the embodiments described herein, can successfully capture the semantic information from an input image; Par. [0188-190]: FIG. 22 is a simplified block diagram of a computer-based system 2220 that may be used to implement any variety of the embodiments of the present invention described herein. The system 2220 comprises a bus 2223. The bus 2223 serves as an interconnect between the various components of the system 2220. Connected to the bus 2223 is an input/output device interface 2226 for connecting various input and output devices such as a keyboard, mouse, display, speakers, etc. to the system 2220. A central processing unit (CPU) 2222 is connected to the bus 2223 and provides for the execution of computer instructions implementing embodiments… FIG. 23 illustrates a computer network environment 2330 in which an embodiment of the present invention may be implemented. In the computer network environment 2330, the server 2331 is linked through the communications network 2332 to the clients 2333a-n. The environment 2330 may be used to allow the clients 2333a-n, alone or in combination with the server 2331, to execute any of the embodiments described herein. For non-limiting example, computer network environment 2330 provides cloud computing embodiments, software as a service (SAAS) embodiments; receive one or more semantic inputs and to cause one or more photorealistic images to be generated using one or more neural networks and the one or more semantic inputs, and further to cause the one or more substantially photorealistic images to be displayed on the device (e.g. a system for training an image generator comprising a processor (i.e. computer, server, etc.) and a memory with computer code instructions stored thereon, wherein the processor and the memory, with the computer code instructions, are configured to receive input images and generate corresponding semantic segmentation indicating features (i.e. attributes, characteristics, etc.) of the input images (i.e. one more semantic inputs), as indicated above, for example, to impose semantic information on the image generation process, including a generator, a discriminator, and a segmentor implemented with neural networks (i.e. using one or more neural networks) designed to impose semantic information on the generated images to provide realistic image generation results (i.e. generate one or more photorealistic images based  the one more semantic inputs using one or more neural networks) by ensuring that the generated/translated images are as realistic as the real images), including a computer network environment in which, the server is linked through the communications network to clients, which typically include a display, and the environment is used to allow the clients, alone or in combination with the server, to execute any of the embodiments, including causing generated images (i.e. the one or more photorealistic images) to be displayed on one or more client devices, as indicated above), for example), but fails to teach the following as further recited in claim 55.
However, Dai teaches semantic inputs indicating one or more regions of one or more images (Par. [0002-3]: technologies for training networks for semantic segmentation. Such techniques can be useful for increasing the accuracy of object identification in an image. Through a training process, images inputted inputted into a network may have an increased level of semantic segmentation over similar but untrained networks… a system can include a trainable neural network. The neural network can receive a training image as an input. The system can generate several candidate segment masks based on the training image. The candidate segment masks can be ranked from a relatively higher degree of accuracy to a relatively lower degree of accuracy to generate a ranked set of candidate segment masks. One or more masks of the ranked set of candidate segment masks are selected. One of the selected ranked set of candidate segment masks can be input into the neural network to train the neural network. The training process may continue for a desired number of times until the neural network can be trained to a desired level… using a ground-truth bounding box as an input and generated candidate segment masks to train the neural network can reduce the workload of annotation training images for semantic segmentation… spotting the ground-truth bounding box for the candidate segment mask generation can involve less computing resources compared to other technologies… the cost of training a neural network to perform semantic segmentation can be reduced, as the reliance upon human-generated data can be reduced; Par. [0051-53]: the semantic segmentation framework is not limited to any particular type, size, or style of image… the training image 232 can be labeled with one or more ground-truth bounding boxes of objects (e.g. “person,” “car,” “boat”). A ground-truth bounding box may be provided by a human or other system considered to have a relatively high degree of accuracy in labeling images… The training supervisor 118 invokes the mask generator 224 to generate candidate segments masks 234A-N… the candidate segment masks 234A-N can be generated based on a criteria of relevance to the training image 232… The training supervisor 118 ranks the candidate segment masks 234A-N to generate the ranked candidate segment masks 236A-N… the ranked candidate segment masks 236A-N can be ranked according to a measure of how close the masks resemble the ground-truth bounding boxes in the training image 232… From the ranked candidate segment masks 236, the training supervisor 118 selects a set 238 of the ranked candidate segment masks 236, illustrated in FIG. 3 as ranked candidate segment masks 236 A, 236 B, 236 C, 236 D, and 236 E. The set 238 can be generated using various technologies… One of the ranked candidate masks 236A-E can be selected and used to train the neural network 116. The neural network updater 228 receives the selected candidate segment mask 240 to train the neural network 116. The selected candidate segment mask 240 can be used as an input to the training supervisor 118 to rank the candidate segment masks 234A-N, allowing repetition of the process; Par. [0062-63]: an image with ground-truth bounding boxes can be used as the initial input to generate candidate segment masks, while a selected candidate segment mask can be used to train the neural network, both in the initial training evolution and subsequent training evolutions. In this manner, the accuracy of ground-truth bounding boxes can be used in conjunction with a relatively low-cost automated approach… the mask generator 224 generates candidate segment masks based on the received training image. The candidate segment masks may be several segment masks that are generated that approximate the segmentation of the objects in the received training image. As noted above, the received training image can be segmented using ground-truth bounding boxes. In semantic segmentation, the objects in an image are segmented not at the bounding box level, but rather, at the pixel level. The image can be segmented into regions comprising the various objects defined by the class labels. The candidate segment masks represent various estimations of semantic segmentation using the ground-truth bounding box as the input for the first evolution of training; semantic inputs indicating one or more regions of one or more images (e.g. semantic segmentation process includes images (i.e. one or more images) inputted into a network for training, including a neural network which receives each training image as an input to generate segment masks (i.e. one or more regions of one or more images) based on the training image and one of the ranked candidate masks is selected and used as input to train the neural network (i.e. semantic inputs indicating one or more regions of one or more images), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 1.
The combined teachings above fail to teach the following as further recited in claim 55.
However, teaches receive one or more semantic inputs from one or more users (Pg. 2: in practical image editing tasks like those done with Photoshop, users may want to retain spatial freedom during the transformation; that is, users want fine control over the region of transformation… perform arbitrary, user-selected partial transformations (like those described above) over an arbitrary region of a user’s choice… a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 3: Given the input image of interest and the conditional generator G, the process begins by prompting the user to specify the region of an image to be edited, or the region of the image containing the object that the user wants to transform. The user can then apply spatial class translation and semantic transplantation on the selected region… With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… the strength of the user-selected class features are continuously increasing with the morphing strength… With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed; Pg. 4: algorithm also looks for the latent variable z that reconstructs the clip… from the reference image. Let R be the user-selected subregion of x on which to conduct the transformation, and let R0 be the region… from which the user wants to transplant the semantic information; receive one or more semantic inputs from one or more users (e.g. receive one or more user-selected features, including users wanting to retain spatial freedom during the transformation of images (i.e. receive one or more semantic inputs from one or more users), as indicated above).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 37.

Regarding claim 56, claim 55 is incorporated and the steps of the program further recited in claim 56 recite similar concept which corresponds to claim 2 when executed and are rejected as applied to computer readable medium claim 2 above.

Regarding claim 57, claim 56 is incorporated and the steps of the program further recited in claim 57 recite similar concept which corresponds to claim 3 when executed and are rejected as applied to computer readable medium claim 3 above.

Regarding claim 58, claim 57 is incorporated and the steps of the program further recited in claim 58 recite similar concept which corresponds to claim 4 when executed and are rejected as applied to computer readable medium claim 4 above.

Regarding claim 59, claim 58 is incorporated and the steps further recited in claim 59 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 65, claim 64 is incorporated and the steps of the program further recited in claim 65 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 71, claim 70 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 65 above.

Regarding claim 77, claim 76 is incorporated and the steps further recited in claim 76 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 83, claim 82 is incorporated and the steps further recited in claim 83 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 89, claim 88 is incorporated and the steps further recited in claim 89 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Regarding claim 101, claim 100 is incorporated and the steps of the program further recited in claim 101 recite similar concept which corresponds to claim 5 when executed and are rejected as applied to computer readable medium claim 5 above.

Claim 96 is rejected under 35 U.S.C. 103 as being unpatentable over Fu, in view of view of Suzuki, as applied to claim 91 above, and in further view of Lin et al. (PG Pub. No. 2017/0344884 A1), hereafter referred to as Lin, Applicant cited prior art originally cited by the examiner during examination of parent application.

Regarding claim 96, claim 95 is incorporated and the combination of Fu in view Suzuki teaches the system (Fu, Par. 0004) wherein the one or more processors are further to modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks (Suzuki, Pg. 1, Abstract: CNN-based image editing method that allows the user to change the semantic information of an image over a user-specified region. Our method makes this possible by combining the idea of manifold projection with spatial conditional batch normalization (sCBN), a version of conditional batch normalization with userspecifiable spatial weight maps. With sCBN and manifold projection, our method lets the user perform (1) spatial class translation that changes the class of an object over an arbitrary region of user’s choice, and (2) semantic transplantation that transplants semantic information contained in an arbitrary region of the reference image to an arbitrary region in the target image; Pg. 1, Par. 1-2: deep generative models like generative adversarial networks (GANs) [10] and variational autoencoders (VAEs) [20] make possible the unsupervised learning of rich latent semantic information from images… Image conditional GANs [24, 40, 17] based on encoderdecoder architectures have been popular both for their convenient implemention in end-to-end differentiable ML frameworks, and their uncanny ability to produce photo- realistic images; Pg. 2, Par. 3: CNN-based image editing method that grants the user this very freedom. With our method, the user can transform a user-chosen part of image in a copy-paste fashion–and the user can do this all the while preserving semantic consistency. More precisely, we present a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 2, Par. 7-8: Class-conditional GAN [29, 26, 43, 2] is a framework designed to learn an invariant latent representation among various classes, and it is capable of generating diverse images from a same latent code z by changing class embedding (Figure 2). The work of [26, 2], in particular, succeeded in producing an impressive results by interpolating the parameters of conditional batch normalization layer, which was first introduced in [31, 5]. Conditional batch normalization (CBN) is mechanism that learns conditional information by separately learning condition-specific scaling parameter and shifting parameter for batch normalization. Our method extends the technique used in [26] by restricting the region of interpolation to a region that corresponds to the region of interest in the pixel space. We will refer to our approach spatial conditional batch normalization (sCBN). Unlike the manipulation done in style transfer [12], we introduce the conditional information at multiple levels in the network, depending on the style preference of the user. As we will show, sCBN in the lower layers transforms global features, and sCBN at upper layers transforms local features… Semantic transformation. In order to grant the user with wide freedom of semantic transformation, there has to be some mechanism to finely adjust the user-suggested transformation so that the final product becomes natural; Pg. 3, Par. 3-4: Spatial Class-translation With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… Spatial Semantic Transplantation With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed. Our method first prompts the user to specify the region in the target image containing the object of interest, along with the reference image of equal size. The user will be also asked to specify the region in the reference image that contains the semantic information to be transplanted. The method then automatically transplants the semantic information of the specified region of the reference image into the target image; Pg. 4, Par. 2-5 and Pg. 5: spatial class translation Our method functions on a trained conditional generator G, paired with the discriminator D with which G was trained. Upon receiving the region of interest x clipped from the target image and the class c of the target object contained in x, the algorithm begins by looking for a latent variable z such that G(z; c) will be close to x in the feature space of D (Manifold Projection step). The class c can either be specified by the user or by a pre-trained classifier. Suppose that the user wants to partially translate a region R in x to a class c′, and let Vℓ be the set of features in ℓ-th conditional batch normalization(CBN) layers that correspond to R in the pixel space. Our method then simply substitutes the parameters governing the shift and mean parameters of Vℓ with those of c′ (Figure 6). This will result in a modification of G… in which the CBN parameters of Vℓ exclusively carry the style information of the class c′. A transformed image can be constructed by applying this modified G… to z… our spatial editing method is applicable to any generative model (e.g., GAN, VAE) that is equipped with a machanism to iteratively incorporate class information during its image generation process. We will next elaborate on the design of our sCBN and spatial semantic implantation, along with the other details omitted in the brief description above… Spatial conditional batch normalization (sCBN) is the core of our spatial class translation. As can be inferred from our naming, sCBN is based on batch normalization (BN) [16], a technique developed for the purpose of reducing the internal covariance shift to accelerate the training of neural network. More precisely, we will borrow our idea from conditional batch normalization (CBN) [8, 5], a variant of BN that incorporates the class specific semantic information in the parameters for BN. Given a set of batches sampled each from a single class, the conditional batch normalization [8, 5] works by modulating the set of intermediate features produced from each batch of inputs so that it follow a normal distribution with mean and variance that are specific to the corresponding class. Let us fix the layer ℓ, and let Fk,h,w represent the feature of ℓ-th layer at channel k, height location h, and width location w. Given a batch {Fi,k,h,w} of Fk,h,w s generated from class c, the CBN at layer ℓ then transforms Fi,k,h,w… In our implementation, we replaced CBN at each layer with sCBN… After training the encoder, one can produce the reconstruction of x by applying G to z = E(x). In the reconstructed image, however, semantically independent objects are often dis-aligned. We therefore calibrate z by backpropagating the loss L. After some rounds of calibration, we can use the resulting z for the image transformation… instead of calibrating the latent variable z by backpropagating L through G, we will calibrate ζ by backpropagating L through G and B; Pg. 6, Par. 3: generator used in our study is a ResNet-based generator trained as part of a conditional DCGAN. Each residua l block in our generator contains the conditional batch normalization (CBN) layer. At the time of inference, these CBN layers are replaced by the aforementioned sCBN layers that are tailored to the user’s preference. We base our architectures on those used in previous work [25, 26], and used the pre-trained model from [26]; Pg. 8, Par. 6: image transformation method that allows the user to translate the class of an object and transplant semantic features over a user-specified pixel region of the image. Indeed, there is still much room left for the exploration of the semantic information contained in the intermediate feature spaces of CNNs. We were, however, able to show that we can manipulate this information in a somewhat intuitive manner and produce customized photorealistic images; Pg. 11, Par. 9: we conducted a set of automatic spatial class translations. For each one the selected images, we (1) used a pre-trained model to extract the region of the object to be transformed (dog/cat), (2) conducted the manifold projection to obtain the z, (3) passed z to the generator with the class map corresponding to the segmented region, and (4) conducted a post-processing over the segmented region. For the semantic segmentation, we used a TensorFlow implementation of DeepLab v3 Xception model trained on MS COCO dataset; modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks (e.g. image generative model which uses a neural network, such as a Generative Adversarial Network (GAN), including spatial conditional (i.e. adaptive, instant, etc.) batch normalization (sCBN) layers (i.e. the at least one spatially-adaptive normalization layer), that is equipped with a mechanism to iteratively incorporate (i.e. propagate, communicate, pass, feed, transfer, etc.) class (i.e. feature, attribute, label, etc.) information during its image generation, including calculating a loss (i.e. error, difference, etc.) between semantically independent objects (i.e. semantic information) and backpropagating the loss throughout other layers of the one or more neural networks to update network through the backpropagation, including semantic features (i.e. class, attribute, label, etc.) of selected object regions (i.e. propagate semantic information), which are segmented/extracted in a reference (i.e. source, input, etc.) image, corresponding to object(s) in a target image to be transformed (i.e. an image to be generated), based on the set of features (i.e. a set of activations), Vℓ, in ℓ-th (first, second, third… Nth) conditional (i.e. adaptive, instance, etc.) batch normalization (CBN) layers that correspond to a region R in the pixel space (i.e. a spatially-adaptive transformation), by incorporating the class specific semantic information in the parameters for batch normanilation (BN), including a number of spatial conditional batch normalization (sCBN) layers, and given a set of batches sampled each from a single class, the conditional batch normalization works by modulating the set of intermediate features (i.e. modulating a set of activations through a spatially-adaptive transformation) produced from each batch of inputs so that it follow a normal distribution with mean and variance that are specific to the corresponding class (i.e. modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 91.
Although, Suzuki teaches modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks, as indicated above, it does not expressly disclose a set of “activations”.
However, Lin teaches a set of activations (Par. [0004]: machine learning techniques learn patterns of neurons by progressing through layers of a neural network. The patterns of the neurons are used to identify existence of a semantic class within an image, such as an object, feeling, and so on as described previously. As part of this, relevancies of the neurons to the semantic class are also communicated back through layers of a neural network. Through use of these relevancies, activation relevancy maps are created that describe relevancy of portions of the image associated with neurons of the neural network to the semantic class. In this way, the semantic class is localized to portions of the image. To do so, positive and not negative relevancies are communicated through the neural network. For example, communication of positive relevancies describes portions of the image that are relevant to the semantic class, whereas negative relevancies do not; Par. [0024-28]: a neural network that includes a plurality of layers. Each of the plurality of layers includes a plurality of neurons that are used as part of classification. Neurons are implemented by a computing device as a mathematical approximation of a biological neuron that accept a vector of inputs, performs transformation on the inputs, and outputs a scalar value. In order to classify an image as corresponding to a particular semantic class, for instance, the techniques involve learning patterns of neurons through successive layers of the neural network. These patterns, once learned, are then usable to determine whether subsequent images include or do not include semantic classes that corresponds to the patterns. In other words, the patterns are used to define the "what" is included in the image through classification into a corresponding semantic class… classification of the particular semantic class within the image, relevancy of the neurons to the semantic class (e.g., object) is also communicated backwards through the sequence that is used to aggregate the patterns as described above. This process is also referred to as a "back propagation" technique. For example, activation relevancy maps may be used to define relevancy of neurons at respective layers of the neural network to the semantic class. By communicating these relevancies backwards through the sequence of layers, locations within the image (as corresponding to respective nodes in the neural network) may be further refined for increasingly smaller portions of the image as being relevant to the semantic class… A neuron in one layer of the neural network, for instance, may be connected to a plurality of neurons in another layer. In other words, the plurality of neurons is considered children of the neuron when progressing backward through the sequence, i.e., one to many This relationship is also used to aggregate patterns of neurons when progressing forward through the sequence (i.e., many to one). When progressing backwards through the sequence, probabilities of relevancy of the child neurons to the semantic class are determined, and positive relevancies are propagated whereas negative relevancies are not… this communication may use a linear function which allows efficient computation of any linear combination of relevancies (e.g., activation relevancy maps as described in the following), thereby promoting computational efficiency. Communication of relevancies may also be configured to preserve a sum of relevance values across layers of the neural network which normalizes the activation relevance maps for comparison; Par. [0042-55]: neural network 202 is also configured to support communication of relevancy back propagation 320 to progress backwards through the sequence used to aggregate activations. This is performed through the use of activation relevancy maps 322, 324, 326 that describe relevancy of the neurons to the semantic class. By progressing backwards through the sequence, this relevancy may be further refined to increasingly smaller portions of the input image 302 and thus serve to localize relevant portions within the input image 302 to identification of the semantic class. In this way, neurons that are considered relevant to the semantic class may be used to localize the semantic class within the input image 302 through definition of spatial information of the neurons and relevancy of those neurons to the semantic class. In other words, the particular outcome (e.g., the semantic class) is communicated (i.e., back propagated) through the neural network to localize how that outcome was obtained. Accordingly, this technique is applicable to any outcome that may be determined using a neural network 202 to determine which neurons and information relating to those neurons (e.g., portions of a picture) are used to achieve that outcome, such as to identify objects, emotions, and so forth… A2. An activation neuron is tuned to detect certain visual features. Its response is positively correlated to its confidence of the detection… A2 has been empirically verified by a variety of recent works. It is observed that neurons at lower layers detect simple features like edge and color while neurons at higher layers can detect complex features… Between activation neurons, a connection is "excitatory" if its weight is non-negative, and "inhibitory" otherwise… The excitation backpropagation technique passes top-down signals through excitatory connections between activation neurons; a set of activations (e.g. patterns of neurons are used to identify existence of a semantic class (i.e. semantic information) within an image, such as an object, feeling, and so on, and, as part of this, relevancies of the neurons to the semantic class are communicated (i.e. propagated, fed, passed, etc.) back through layers of a neural network, for example, and through use of these relevancies, activation relevancy maps (i.e. a set of activations) are created that describe relevancy of portions (i.e. regions) of the image associated with neurons of the neural network to the semantic class, at respective layers of each the neural network to the semantic class, in which neurons that are considered relevant to the semantic class are used to localize the semantic class within the input image through definition of spatial information of the neurons and relevancy of those neurons to the semantic class, and each activation neuron is tuned to detect certain visual features (i.e. activations, outputs, results. etc.), including communication (i.e. propagation) of positive relevancies describing portions of images that are relevant to the semantic class, which are propagated, whereas negative relevancies are not, and the communication of relevancies preserves a sum of relevance values across layers of the neural network which normalizes the activation relevance maps for comparison, as indicated above), for example).
Fu, Suzuki, and Lin are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, the combined teachings of Fu, Suzuki, and Li, as a whole, would have rendered obvious the invention recited in claim 96 with a reasonable expectation of success in order to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with a set of activations (as taught by Lin, Abstract, Par. [0004, 24-28, 42-55]) by using activation relevancy maps having increased amounts of resolution generated by progressing backward through the layers of neural networks (Lin, Abstract, Par. [0001-5, 27, 40, 46]).

Claims 6, 12, 18, 24, 30, 36, 42, 48, 54, 60, 66, 72, 78, 84, 90, and 102 are rejected under 35 U.S.C. 103 as being unpatentable over Fu in view of Dai in further view of Suzuki, as applied to claim 5 above, and in further view of Lin et al. (PG Pub. No. 2017/0344884 A1), hereafter referred to as Lin, Applicant cited prior art originally cited by the examiner during examination of parent application.

Regarding claim 6, claim 5 is incorporated and the combination of Fu, Dai, and Suzuki, as a whole, teaches the non-transitory computer-readable medium (Fu, Par. [Par. [0004 and 0191]), wherein the instructions when performed further cause the one or more processors to: 
modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks (Suzuki, Pg. 1, Abstract: CNN-based image editing method that allows the user to change the semantic information of an image over a user-specified region. Our method makes this possible by combining the idea of manifold projection with spatial conditional batch normalization (sCBN), a version of conditional batch normalization with userspecifiable spatial weight maps. With sCBN and manifold projection, our method lets the user perform (1) spatial class translation that changes the class of an object over an arbitrary region of user’s choice, and (2) semantic transplantation that transplants semantic information contained in an arbitrary region of the reference image to an arbitrary region in the target image; Pg. 1, Par. 1-2: deep generative models like generative adversarial networks (GANs) [10] and variational autoencoders (VAEs) [20] make possible the unsupervised learning of rich latent semantic information from images… Image conditional GANs [24, 40, 17] based on encoderdecoder architectures have been popular both for their convenient implemention in end-to-end differentiable ML frameworks, and their uncanny ability to produce photo- realistic images; Pg. 2, Par. 3: CNN-based image editing method that grants the user this very freedom. With our method, the user can transform a user-chosen part of image in a copy-paste fashion–and the user can do this all the while preserving semantic consistency. More precisely, we present a method that features two types of image transformation: (1) spatial class-translation that translates a class category of a region of interest, and (2) semantic transplantation that transplants a semantic feature of an user-selected region in an arbitrary image to a region of interest in the target image. To facilitate this editing process, we also propose an efficient optimization method to project images onto the latent space of generator; Pg. 2, Par. 7-8: Class-conditional GAN [29, 26, 43, 2] is a framework designed to learn an invariant latent representation among various classes, and it is capable of generating diverse images from a same latent code z by changing class embedding (Figure 2). The work of [26, 2], in particular, succeeded in producing an impressive results by interpolating the parameters of conditional batch normalization layer, which was first introduced in [31, 5]. Conditional batch normalization (CBN) is mechanism that learns conditional information by separately learning condition-specific scaling parameter and shifting parameter for batch normalization. Our method extends the technique used in [26] by restricting the region of interpolation to a region that corresponds to the region of interest in the pixel space. We will refer to our approach spatial conditional batch normalization (sCBN). Unlike the manipulation done in style transfer [12], we introduce the conditional information at multiple levels in the network, depending on the style preference of the user. As we will show, sCBN in the lower layers transforms global features, and sCBN at upper layers transforms local features… Semantic transformation. In order to grant the user with wide freedom of semantic transformation, there has to be some mechanism to finely adjust the user-suggested transformation so that the final product becomes natural; Pg. 3, Par. 3-4: Spatial Class-translation With our spatial class translation, the user can change the class of the object in the user-selected region of interest (ROI). The user can change the class of a part of the target objects in intuitive fashion… Spatial Semantic Transplantation With our semantic transplantation, the user can transplant a semantic feature of the user-selected object in the reference image to an object in the target image to be transformed. Our method first prompts the user to specify the region in the target image containing the object of interest, along with the reference image of equal size. The user will be also asked to specify the region in the reference image that contains the semantic information to be transplanted. The method then automatically transplants the semantic information of the specified region of the reference image into the target image; Pg. 4, Par. 2-5 and Pg. 5: spatial class translation Our method functions on a trained conditional generator G, paired with the discriminator D with which G was trained. Upon receiving the region of interest x clipped from the target image and the class c of the target object contained in x, the algorithm begins by looking for a latent variable z such that G(z; c) will be close to x in the feature space of D (Manifold Projection step). The class c can either be specified by the user or by a pre-trained classifier. Suppose that the user wants to partially translate a region R in x to a class c′, and let Vℓ be the set of features in ℓ-th conditional batch normalization(CBN) layers that correspond to R in the pixel space. Our method then simply substitutes the parameters governing the shift and mean parameters of Vℓ with those of c′ (Figure 6). This will result in a modification of G… in which the CBN parameters of Vℓ exclusively carry the style information of the class c′. A transformed image can be constructed by applying this modified G… to z… our spatial editing method is applicable to any generative model (e.g., GAN, VAE) that is equipped with a machanism to iteratively incorporate class information during its image generation process. We will next elaborate on the design of our sCBN and spatial semantic implantation, along with the other details omitted in the brief description above… Spatial conditional batch normalization (sCBN) is the core of our spatial class translation. As can be inferred from our naming, sCBN is based on batch normalization (BN) [16], a technique developed for the purpose of reducing the internal covariance shift to accelerate the training of neural network. More precisely, we will borrow our idea from conditional batch normalization (CBN) [8, 5], a variant of BN that incorporates the class specific semantic information in the parameters for BN. Given a set of batches sampled each from a single class, the conditional batch normalization [8, 5] works by modulating the set of intermediate features produced from each batch of inputs so that it follow a normal distribution with mean and variance that are specific to the corresponding class. Let us fix the layer ℓ, and let Fk,h,w represent the feature of ℓ-th layer at channel k, height location h, and width location w. Given a batch {Fi,k,h,w} of Fk,h,w s generated from class c, the CBN at layer ℓ then transforms Fi,k,h,w… In our implementation, we replaced CBN at each layer with sCBN… After training the encoder, one can produce the reconstruction of x by applying G to z = E(x). In the reconstructed image, however, semantically independent objects are often dis-aligned. We therefore calibrate z by backpropagating the loss L. After some rounds of calibration, we can use the resulting z for the image transformation… instead of calibrating the latent variable z by backpropagating L through G, we will calibrate ζ by backpropagating L through G and B; Pg. 6, Par. 3: generator used in our study is a ResNet-based generator trained as part of a conditional DCGAN. Each residua l block in our generator contains the conditional batch normalization (CBN) layer. At the time of inference, these CBN layers are replaced by the aforementioned sCBN layers that are tailored to the user’s preference. We base our architectures on those used in previous work [25, 26], and used the pre-trained model from [26]; Pg. 8, Par. 6: image transformation method that allows the user to translate the class of an object and transplant semantic features over a user-specified pixel region of the image. Indeed, there is still much room left for the exploration of the semantic information contained in the intermediate feature spaces of CNNs. We were, however, able to show that we can manipulate this information in a somewhat intuitive manner and produce customized photorealistic images; Pg. 11, Par. 9: we conducted a set of automatic spatial class translations. For each one the selected images, we (1) used a pre-trained model to extract the region of the object to be transformed (dog/cat), (2) conducted the manifold projection to obtain the z, (3) passed z to the generator with the class map corresponding to the segmented region, and (4) conducted a post-processing over the segmented region. For the semantic segmentation, we used a TensorFlow implementation of DeepLab v3 Xception model trained on MS COCO dataset; modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks (e.g. image generative model which uses a neural network, such as a Generative Adversarial Network (GAN), including spatial conditional (i.e. adaptive, instant, etc.) batch normalization (sCBN) layers (i.e. the at least one spatially-adaptive normalization layer), that is equipped with a mechanism to iteratively incorporate (i.e. propagate, communicate, pass, feed, transfer, etc.) class (i.e. feature, attribute, label, etc.) information during its image generation, including calculating a loss (i.e. error, difference, etc.) between semantically independent objects (i.e. semantic information) and backpropagating the loss throughout other layers of the one or more neural networks to update network through the backpropagation, including semantic features (i.e. class, attribute, label, etc.) of selected object regions (i.e. propagate semantic information), which are segmented/extracted in a reference (i.e. source, input, etc.) image, corresponding to object(s) in a target image to be transformed (i.e. an image to be generated), based on the set of features (i.e. a set of activations), Vℓ, in ℓ-th (first, second, third… Nth) conditional (i.e. adaptive, instance, etc.) batch normalization (CBN) layers that correspond to a region R in the pixel space (i.e. a spatially-adaptive transformation), by incorporating the class specific semantic information in the parameters for batch normanilation (BN), including a number of spatial conditional batch normalization (sCBN) layers, and given a set of batches sampled each from a single class, the conditional batch normalization works by modulating the set of intermediate features (i.e. modulating a set of activations through a spatially-adaptive transformation) produced from each batch of inputs so that it follow a normal distribution with mean and variance that are specific to the corresponding class (i.e. modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks), as indicated above), for example).
The same motivation to combine above-mentioned teachings applies, as previously indicated in claim 5.
Although, Suzuki teaches modulate, by the at least one spatially-adaptive normalization layer, a set of activations through a spatially-adaptive transformation in order to propagate the semantic information throughout the other layers of the one or more neural networks, as indicated above, it does not expressly disclose a set of “activations”.
However, Lin teaches a set of activations (Par. [0004]: machine learning techniques learn patterns of neurons by progressing through layers of a neural network. The patterns of the neurons are used to identify existence of a semantic class within an image, such as an object, feeling, and so on as described previously. As part of this, relevancies of the neurons to the semantic class are also communicated back through layers of a neural network. Through use of these relevancies, activation relevancy maps are created that describe relevancy of portions of the image associated with neurons of the neural network to the semantic class. In this way, the semantic class is localized to portions of the image. To do so, positive and not negative relevancies are communicated through the neural network. For example, communication of positive relevancies describes portions of the image that are relevant to the semantic class, whereas negative relevancies do not; Par. [0024-28]: a neural network that includes a plurality of layers. Each of the plurality of layers includes a plurality of neurons that are used as part of classification. Neurons are implemented by a computing device as a mathematical approximation of a biological neuron that accept a vector of inputs, performs transformation on the inputs, and outputs a scalar value. In order to classify an image as corresponding to a particular semantic class, for instance, the techniques involve learning patterns of neurons through successive layers of the neural network. These patterns, once learned, are then usable to determine whether subsequent images include or do not include semantic classes that corresponds to the patterns. In other words, the patterns are used to define the "what" is included in the image through classification into a corresponding semantic class… classification of the particular semantic class within the image, relevancy of the neurons to the semantic class (e.g., object) is also communicated backwards through the sequence that is used to aggregate the patterns as described above. This process is also referred to as a "back propagation" technique. For example, activation relevancy maps may be used to define relevancy of neurons at respective layers of the neural network to the semantic class. By communicating these relevancies backwards through the sequence of layers, locations within the image (as corresponding to respective nodes in the neural network) may be further refined for increasingly smaller portions of the image as being relevant to the semantic class… A neuron in one layer of the neural network, for instance, may be connected to a plurality of neurons in another layer. In other words, the plurality of neurons is considered children of the neuron when progressing backward through the sequence, i.e., one to many This relationship is also used to aggregate patterns of neurons when progressing forward through the sequence (i.e., many to one). When progressing backwards through the sequence, probabilities of relevancy of the child neurons to the semantic class are determined, and positive relevancies are propagated whereas negative relevancies are not… this communication may use a linear function which allows efficient computation of any linear combination of relevancies (e.g., activation relevancy maps as described in the following), thereby promoting computational efficiency. Communication of relevancies may also be configured to preserve a sum of relevance values across layers of the neural network which normalizes the activation relevance maps for comparison; Par. [0042-55]: neural network 202 is also configured to support communication of relevancy back propagation 320 to progress backwards through the sequence used to aggregate activations. This is performed through the use of activation relevancy maps 322, 324, 326 that describe relevancy of the neurons to the semantic class. By progressing backwards through the sequence, this relevancy may be further refined to increasingly smaller portions of the input image 302 and thus serve to localize relevant portions within the input image 302 to identification of the semantic class. In this way, neurons that are considered relevant to the semantic class may be used to localize the semantic class within the input image 302 through definition of spatial information of the neurons and relevancy of those neurons to the semantic class. In other words, the particular outcome (e.g., the semantic class) is communicated (i.e., back propagated) through the neural network to localize how that outcome was obtained. Accordingly, this technique is applicable to any outcome that may be determined using a neural network 202 to determine which neurons and information relating to those neurons (e.g., portions of a picture) are used to achieve that outcome, such as to identify objects, emotions, and so forth… A2. An activation neuron is tuned to detect certain visual features. Its response is positively correlated to its confidence of the detection… A2 has been empirically verified by a variety of recent works. It is observed that neurons at lower layers detect simple features like edge and color while neurons at higher layers can detect complex features… Between activation neurons, a connection is "excitatory" if its weight is non-negative, and "inhibitory" otherwise… The excitation backpropagation technique passes top-down signals through excitatory connections between activation neurons; a set of activations (e.g. patterns of neurons are used to identify existence of a semantic class (i.e. semantic information) within an image, such as an object, feeling, and so on, and, as part of this, relevancies of the neurons to the semantic class are communicated (i.e. propagated, fed, passed, etc.) back through layers of a neural network, for example, and through use of these relevancies, activation relevancy maps (i.e. a set of activations) are created that describe relevancy of portions (i.e. regions) of the image associated with neurons of the neural network to the semantic class, at respective layers of each the neural network to the semantic class, in which neurons that are considered relevant to the semantic class are used to localize the semantic class within the input image through definition of spatial information of the neurons and relevancy of those neurons to the semantic class, and each activation neuron is tuned to detect certain visual features (i.e. activations, outputs, results. etc.), including communication (i.e. propagation) of positive relevancies describing portions of images that are relevant to the semantic class, which are propagated, whereas negative relevancies are not, and the communication of relevancies preserves a sum of relevance values across layers of the neural network which normalizes the activation relevance maps for comparison, as indicated above), for example).
Fu, Dai, Suzuki, and Lin are considered to be analogous art because they pertain to image processing applications based on neural networks. Therefore, the combined teachings of Fu, Dai, Suzuki, and Li, as a whole, would have rendered obvious the invention recited in claim 6 with a reasonable expectation of success in order to modify the method for image generation through use of adversarial networks (as disclosed by Fu) with a set of activations (as taught by Lin, Abstract, Par. [0004, 24-28, 42-55]) by using activation relevancy maps having increased amounts of resolution generated by progressing backward through the layers of neural networks (Lin, Abstract, Par. [0001-5, 27, 40, 46]).

Regarding claim 12, claim 11 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 6 above.

Regarding claim 18, claim 17 is incorporated and the steps of the program further recited in claim 18 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 24, claim 23 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 18 above.

Regarding claim 30, claim 29 is incorporated and the steps of the program further recited in claim 30 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 36, claim 35 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 30 above.

Regarding claim 42, claim 41 is incorporated and the steps of the program further recited in claim 42 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 48, claim 47 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 42 above.

Regarding claim 54, claim 53 is incorporated and the steps of further recited in claim 54 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 60, claim 59 is incorporated and the steps further recited in claim 60 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 66, claim 65 is incorporated and the steps of the program further recited in claim 66 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 72, claim 71 is incorporated and is a corresponding apparatus claim rejected as applied to the computer readable medium claim 66 above.

Regarding claim 78, claim 77 is incorporated and the steps further recited in claim 78 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 84, claim 83 is incorporated and the steps further recited in claim 84 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 90, claim 89 is incorporated and the steps further recited in claim 90 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Regarding claim 102, claim 101 is incorporated and the steps of the program further recited in claim 102 recite similar concept which corresponds to claim 6 when executed and are rejected as applied to computer readable medium claim 6 above.

Conclusion
Applicant’s amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.

Contact
Any inquiry concerning this communication or earlier communications from the examiner should be directed to GUILLERMO RIVERA-MARTINEZ whose telephone number is 571-272-4979. The examiner can normally be reached on Monday-Friday (8am - 5pm Eastern Time). If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vu Le can be reached on 571-272-7332. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/GUILLERMO M RIVERA-MARTINEZ/           Primary Examiner, Art Unit 2668