Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Notice to Applicants
This communication is in response to the Application filed on 12/9/2020.
Claims 1-20 are pending.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
7.	Claim 1-3, 8-10, 12, 14-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (U.S Publication No. 2019/0295302) (hereafter, "Fu") view of Ji et al. (NPL 1, “Saliency detection via conditional adversarial image-to-image network”) (hereafter, "Ji") and further in view of FLETCHER (U.S Publication No. 2016/0350893). 
Regarding claim 1, Fu teaches a method for translating an image, comprising: obtaining an image translation request ([0040] FIG. 1 illustrates the results of an embodiment where, given the input image 100 and target segmentation 101, the proposed SGGAN translates the input image 100 to various combinations of various attributes), wherein the image translation request carries an original image ([0046] In the training procedure 270, the generator 220 is configured to receive the target segmentation 271, desired attributes vector 272, and real image 273 and from the inputs 271-273); down sampling the original image to generate a down-sampled image corresponding to the original image ([0041] the proposed SGGAN framework comprises three networks, depicted in FIGS. 2A ... the generator network 220 includes a convolutional block 221, a down-sampling convolutional block 222): deforming the original image based on the deformation parameters to generate a deformed image; and ([0044] the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images, e.g., 273) fusing the deformed image, and the mask image to generate a target translation image ([0051] During the training process, these three inputs (target segmentation 305, target attributes 306, and input image 304) are fed into the generator 301 to obtain the generated image 307. After generating the image 307, there are three paths. The first path is to input the generated image 307 to the segmentor 303. The segmentor 303 estimates a semantic segmentation 308 from the generated image 307, and the estimated segmentation 308 is then compared with the target segmentation 305 to calculate a fake segmentation loss 309 which is provided to the generator optimizer 310; [0136] A goal of the generator 1401 is to synthesize diverse generated images, i.e., fake images, that comply with the target segmentation 1406 and target attributes 1405).
Fu does not expressly teaches generating a pre-translated image corresponding to the original image, a mask image and deformation parameters corresponding to each pixel of the original image based on the down-sampled image, wherein a size of the pre-translated image and a size of the mask image are the same as a size of the original image, the pre-translated image.
However, Ji teaches generating a pre-translated image corresponding to the original image and (Fig. 1, “Generated fake sample”; Section 3. Method, 3.1. Model overview, “As illustrated in Fig. 1 , our proposed model consists of two components, i.e., generator and discriminator. Concretely, a deep image-to-image translation network is regarded as generator, which is used for achieving saliency segmentation. Inspired by Isola et al. [3,9], we utilize a popular convolutional symmetric encoder-decoder architecture as a generator”), a mask image (Fig. 1, “Ground-Truth”; Section 3. Method, 3.2. Objective function, “Specifically, in image-to-saliency translation task, y should be the ground-truth saliency mask”), wherein a size of the pre-translated image and a size of the mask image are the same as a size of the original image (Fig. 1 shows that the real samples and the generated fake sample are in the same scale as the ground-truth mask; Section 3. Method, 3.1. Model overview, “For decoder, operation of skip-connection [3] is utilized to compensate the information from encoder layers in the same scale”; FIG. 1 shows the size of Generated fake sample and the size of Ground -Truth are the same as the size of Real Samples); the pre-translated image (Fig. 1, “Generated fake sample”).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of generating the fake sample corresponding to the real sample and the ground-truth mask and layers in decoder and encoder and mask are in the same scale taught by Ji.
The suggestion/motivation for doing so would have been to improve the performance of generated saliency mask and the semantic segmentation accuracy (3. Method, 3.3. Densely Conditional Random Field (DCRF) inference, "Recent works [4,28] have shown that complementing CNNs with fully connected conditional random fields (CRFs) can significantly enhance the semantic segmentation accuracy. Inspired by the success of densely connected CRF in enhancing the final result of pixel-level annotation task, we applied densely connected CRF inference in to the image-to-saliency mask translation task in order to further improve the performance of generated saliency mask"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination of Fu and Ji does not expressly teach deformation parameters corresponding to each pixel of the original image based on the down-sampled image.
However, FLETCHER teaches deformation parameters corresponding to each pixel of the original image based on the down-sampled image ([0066] the image distortion map itself may also be represented in a down-sampled form in which the mapping is not read directly from the image distortion map, but is interpolated from values in a low-resolution image distortion map; [0067] At step 203 the moving image m is warped based on the image distortion map by mapping positions from every pixel coordinate in the fixed image f to coordinates in the moving image m. Subsequently, pixel values are interpolated in the moving image m using the mapped positions, and a distorted image is constructed of the same size as the fixed image f that contains interpolated values from the moving image m; [0070] the image distortion map is calculated based on an image pyramid including a list of successively down-sampled images. The lowest resolution images are registered first, which produces a low-resolution image distortion estimate. A rigid transformation is a function modelled by a small number of parameters, such as a pure translation, a rotation/scaling/translation transform; [0071] One method for performing rigid registration includes estimating rotation, scaling and translation by calculating Fourier magnitude images of the selected down-sampled moving and fixed image).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu and Ji to incorporate the step/system of calculating the image distortion map which is interpolated from values based on a list of successively down-sampled images taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu and Ji with FLETCHER to obtain the invention as specified in claim 1.
Regarding claim 2, Fu, Ji and FLETCHER teach all the limitations of claim 1 above. Fu teaches processing the down-sampled image to determine a first feature vector, wherein the first feature vector is used for translating the down-sampled image to a first domain to which the target translation image belongs ([0041] The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated. The generator 220 implemented with the blocks 221-225 is configured to receive the inputs 226, 227, 228 and generate a target image 229 that is based on, i.e., a translated version of, the input image 226 and consistent with the input segmentation 227 and attributes 228) and the deformation parameters based on the second feature vector ([0068] a down-sampling convolutional block configured to extract features of the target segmentation, a first concatenation block configured to concatenate the extracted features with a latent vector, an up-sampling block configured to construct a layout of the fake image using the concatenated extracted features and latent vector, a second concatenation block configured to concatenate the layout with an attribute label to generate a multidimensional matrix representing features of the fake image, and an up-sampling convolutional block configured to generate the fake image using the multidimensional matrix).
Fu does not expressly teaches wherein generating the pre-translated image, the mask image and the deformation parameters comprising, up sampling the first feature vector to generate a second feature vector.
However, Ji teaches wherein generating the pre-translated image (Fig. 1, “Generated fake sample”), the mask image and (Fig. 1, “Ground-Truth”; Section 3. Method, 3.2. Objective function, “Specifically, in image-to-saliency translation task, y should be the ground-truth saliency mask”), up sampling the first feature vector to generate a second feature vector (Fig. 1 and Section 4. Experimental results 4.2. Experimental setup, “we adopted a seven-scale of down-sampling convolutional layers as our encoder, and a symmetric network as decoder. Max pooling operation for down-sample feature maps is replaced by 4x4 convolutional operations with stride 2 and padding 1. This setting turns the whole architecture into a fully convolutional network. For each scale of down sampling, convolutional layer with ReLU activation function and batch normalization layer consist of a down sampling block. For decoder, a symmetric block of ReLU with convolutional transpose layer and Dropout layer is utilized to perform up-sampling); and generating the pre-translated image (Fig. 1, “Generated fake sample”), the mask image (Fig. 1, “Ground-Truth”; Section 3. Method, 3.2. Objective function, “Specifically, in image-to-saliency translation task, y should be the ground-truth saliency mask”).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of generating the fake sample corresponding to the real sample and the ground-truth mask and performing up-sampling by utilizing convolutional transpose layer and Dropout layer with ReLU taught by Ji.
The suggestion/motivation for doing so would have been to improve the performance of generated saliency mask and the semantic segmentation accuracy (3. Method, 3.3. Densely Conditional Random Field (DCRF) inference, "Recent works [4,28] have shown that complementing CNNs with fully connected conditional random fields (CRFs) can significantly enhance the semantic segmentation accuracy. Inspired by the success of densely connected CRF in enhancing the final result of pixel-level annotation task, we applied densely connected CRF inference in to the image-to-saliency mask translation task in order to further improve the performance of generated saliency mask"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination of Fu and Ji does not expressly teach the deformation parameters comprising.
However, FLETCHER teaches the deformation parameters comprising ([0066] the image distortion map … interpolated from values in a low-resolution image distortion map; [0070] small number of parameters, such as a pure translation).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu and Ji to incorporate the step/system of having the deformation parameters taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu and Ji with FLETCHER to obtain the invention as specified in claim 2.
Regarding claim 3, Fu, Ji and FLETCHER teach all the limitations of claim 1 above. Fu does not expressly teach further comprising: obtaining a first target generator based on a first domain carried in the image translation request, wherein the target translation image belongs to the first domain; and processing the down-sampled image with the first target generator, to generate the pre-translated image, the mask image, and the deformation parameters.
  However, Ji teaches further comprising: obtaining a first target generator (Section 3. Method, 3.1. Model overview, “As illustrated in Fig. 1 , our proposed model consists of two components, i.e., generator and discriminator. Concretely, a deep image-to-image translation network is regarded as generator) based on a first domain carried in the image translation request (Fig. 1, “Domain: A” & “Real Samples”), wherein the target translation image belongs to the first domain (Section 3. Method, 3.2. Objective function, “The objective function of a conditional GAN (cGAN) can be expressed as: LcGAN (G, D ) = E x,y ∼p data (x,y) [ logD (x, y )] + E x,y ∼p data (x,y ) ,z∼p z (z) [ log(1 −D (x, G (x, z)))], (3) where y denotes the target-domain image. Specifically, in image- to-saliency translation task, y should be the ground-truth saliency mask”); and processing the down-sampled image with the first target generator, to generate the pre-translated image (Fig. 1, “Generated fake sample” and Section 3. Method, 3.1. Model overview, “As illustrated in Fig. 1 , our proposed model consists of two components, i.e., generator and discriminator. Concretely, a deep image-to-image translation network is regarded as generator, which is used for achieving saliency segmentation. Inspired by Isola et al. [3,9], we utilize a popular convolutional symmetric encoder-decoder architecture as a generator. Specifically, a seven-scale of convolutional layers is designed as encoder. Instead of using max pooling to down-sample feature maps, 4x4 convolutional operations with stride 2 are adopted to make the whole architecture into a fully convolutional network”), the mask image and (Fig. 1, “Ground-Truth”; Section 3. Method, 3.2. Objective function, “Specifically, in image-to-saliency translation task, y should be the ground-truth saliency mask”).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of obtaining a generator based on the Domain A in real samples, the target-domain image belongs to the Domain A and processing the down-sampled image with the generator to generate the fake sample taught by Ji.
The suggestion/motivation for doing so would have been to improve the performance of generated saliency mask and the semantic segmentation accuracy (3. Method, 3.3. Densely Conditional Random Field (DCRF) inference, "Recent works [4,28] have shown that complementing CNNs with fully connected conditional random fields (CRFs) can significantly enhance the semantic segmentation accuracy. Inspired by the success of densely connected CRF in enhancing the final result of pixel-level annotation task, we applied densely connected CRF inference in to the image-to-saliency mask translation task in order to further improve the performance of generated saliency mask"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination of Fu and Ji does not expressly teach the deformation parameters.
However, FLETCHER teaches the deformation parameters ([0066] the image distortion map … interpolated from values in a low-resolution image distortion map; [0070] small number of parameters, such as a pure translation).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu and Ji to incorporate the step/system of using the deformation parameters taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu and Ji with FLETCHER to obtain the invention as specified in claim 3.
Regarding claim 8, Fu, Ji and FLETCHER teach all the limitations of claim 1 above. Fu teaches further comprising: obtaining attribute parameters of the electronic device ([0041] The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated); determining down-sampling coefficients based on the attribute parameters of the electronic device: and ([0041] The generator 240 takes as inputs, a target segmentation 227, a given image 226, and a vector 228 indicating desired attributes of the image to be generated).
Fu does not expressly teach down sampling the original image based on the down-sampling coefficients, to generate the down-sampled image corresponding to the original image.
However, FLETCHER teaches down sampling the original image based on the down-sampling coefficients, to generate the down-sampled image corresponding to the original image ([0120] assume we have an image containing 64M pixels (X=Y=8,192). 1M DCT coefficients provides an appropriately smooth fit to the distortion estimate (P=Q=1,024, up to one eighth the Nyquist frequency). While this number of coefficients provides a sufficiently smooth distortion estimate for digital pathology images, more or less coefficients might be chosen where more or less distortion is expected. In a preferred embodiment, only 1/64th of the possible coefficients Bi,jx are used. This number of coefficients, representing a down-sampling by a factor of 8 in each direction).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of down sampling an image based on the down-sampling coefficients to generate the down-sampled image corresponding to the original image taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu with FLETCHER to obtain the invention as specified in claim 8.
Regarding claim 9, Fu teaches down sampling the set of first images respectively to generate a set of first down-sampled images ([0041] the proposed SGGAN framework comprises three networks, depicted in FIGS. 2A ... the generator network 220 includes a convolutional block 221, a down-sampling convolutional block 222), a set of first mask images ([0053] The third path of generator training is a reconstruction loss path which takes the generated image 307 as an input to the generator 301, as well as two other inputs, a source segmentation 315 (which may be a ground-truth landmark based segmentation); [0056] To train the segmentor 303, the input source image 304 is input to the segmentor 303 to obtain an estimated semantic segmentation 324. Then, this estimated segmentation 324 is compared with a ground-truth source segmentation 315, which may be a landmark based segmentation, to calculate a real segmentation loss 325), deforming the set of first images respectively based on the set of first deformation parameters to obtain a set of first deformed images ([0044] the segmentor 260 provides spatial guidance to the generator 220 to ensure the generated images, e.g., 274, comply with input segmentations, e.g., 271. The discriminator 240 aims to ensure the translated images, e.g., 274, are as realistic as the real images, e.g., 273); fusing each first deformed image in the set of first deformed images ([0136] A goal of the generator 1401 is to synthesize diverse generated images, i.e., fake images, that comply with the target segmentation 1406 and target attributes 1405), to obtain a set of first probabilities, to obtain a set of second probabilities, ([0134] D 1360 is defined as D: x→{Dd (x), Dc (x)}, where Dd (x) gives the probability 1365 of x (an input image such as the real image 1356 or generated image 1355) belonging to the real data distribution), based on the set of first probabilities and the set of second probabilities ([0134] D 1360 is defined as D: x→{Dd (x), Dc (x)}, where Dd (x) gives the probability 1365 of x (an input image such as the real image 1356 or generated image 1355) belonging to the real data distribution), to generate a first target generator belonging to the first domain ([0113] An example embodiment of the invention may be applied to the multi-domain image-to-image translation task using a novel deep learning based adversarial network. In particular, an example embodiment invention may transfer facial attributes (e.g., hair color, gender, age); [0134] an auxiliary classifier is embedded in D 1360 to determine a multi-label classification 1366 which provides attribute-level and domain-specific information back to the generator G 1340 ... Dc(x) outputs the probabilities 1366 of x belonging to nc attribute-level domains).
Fu does not expressly teaches a method for training an image translation model, comprising: obtaining a set of training samples, wherein the set of training samples comprises a set of first images belonging to a first domain and a set of second images belonging to a second domain; processing the set of first down-sampled images respectively with a first initial generator to generate a set of first pre-translated images, ), and a set of first deformation parameters, wherein each first deformation parameter in the set of first deformation parameters corresponds to a respective pixel of the first image in the set of first images respectively, each first pre-translated image in the set of first pre-translated images, and each first mask image in the set of first mask images to obtain a set of third images, inputting the set of third images to a first initial discriminator that each third image is a real image; inputting the set of second images to the first initial discriminator, that each second image is a real image, and correcting the first initial generator and the first initial discriminator, wherein the first target generator belonging to the first domain is configured to translate an image in the first domain into an image in the second domain.
However, Ji teaches a method for training an image translation model (Fig. 1 and Section 3. Method, 3.1. Model overview, “As illustrated in Fig. 1 , our proposed model consists of two components, i.e., generator and discriminator. Concretely, a deep image-to-image translation network is regarded as generator, which is used for achieving saliency segmentation”; 2. Related work, "Inspired by the success of cGAN image-to-image translation task [9] , in this paper we explore the ability of cGAN in saliency detection task. Specifically, instead of using ground-truth saliency map as supervised information, image-to-ground-truth saliency pairs are constructed to guide the training process of generator and discriminator"), comprising: obtaining a set of training samples (Section 3. Method, 3.2. Objective function, "Thus, positive samples should be the real image-to- ground-truth saliency mask pairs, and negative samples should be the image-to-generated saliency map pairs. The first term in Eq. (3) on the right-hand side denotes the loss when training the discriminator with positive samples, and the second term means the loss when training with negative samples"), wherein the set of training samples comprises a set of first images belonging to a first domain (Fig. 1, “Domain: A” & “Real Samples”) and a set of second images belonging to a second domain (Fig. 1, “Domain: B” & “Ground Truth”); processing the set of first down-sampled images respectively with a first initial generator to generate a set of first pre-translated images (Fig. 1, “Generated fake sample” and Section 3. Method, 3.1. Model overview, “As illustrated in Fig. 1 , our proposed model consists of two components, i.e., generator and discriminator. Concretely, a deep image-to-image translation network is regarded as generator, which is used for achieving saliency segmentation. Inspired by Isola et al. [3,9] , we utilize a popular convolutional symmetric encoder-decoder architecture as a generator”), each first pre-translated image in the set of first pre-translated images  (Fig. 1, “Generated fake sample”), and each first mask image in the set of first mask images to obtain a set of third images (Section 3. Method, 3.2. Objective function, "Thus, positive samples should be the real image-to-ground-truth saliency mask pairs, and negative samples should be the image-to-generated saliency map pairs. The first term in Eq. (3) on the right-hand side denotes the loss when training the discriminator with positive samples”); inputting the set of third images to a first initial discriminator that each third image is a real image; inputting the set of second images to the first initial discriminator, that each second image is a real image (Section 3. Method, 3.2. Objective function, "When performing image-to-image translation under such a cGAN framework, instead of determining real or fake sample with a single image, the discriminator in conditional GAN aims at distinguishing real pairs from fake pairs. Thus, positive samples should be the real image-to-ground-truth saliency mask pairs, and negative samples should be the image-to-generated saliency map pairs. The first term in Eq. (3) on the right-hand side denotes the loss when training the discriminator with positive samples, and the second term means the loss when training with negative samples"); and correcting the first initial generator and the first initial discriminator (Section 3. Method, 3.2. Objective function, “the initial training process of cGAN model is more important to obtain a reasonable translation model, and the key to ‘fool’ the discriminator in the training stage is producing more hard negative samples, which will lead the discriminator to make wrong decision, i.e., fake pairs pass the testing. Empirically, easy negative sample is kind of meaningless to train such cGAN model. Intuitively, when the positive and easy negative samples are easy to be distinguished, the discriminator will become more and more stronger along with the training process, and the generator may drop into serious mode collapsing ‘dilemma’ by producing more easy negative samples with serval simple patterns. We thus introduce L1-norm loss between the real pair and fake pair in the objective function to penalize the generation of easy negative samples in the training stage. Since a source-domain image is unchanged in the real and fake pair, we only compute the L1-norm between the generated image and the target-domain ground-truth image”), wherein the first target generator belonging to the first domain is configured to translate an image in the first domain into an image in the second domain (FIG. 1 shows the Generator belonging to the Domain A translates generated fake sample in Domain B).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of training an image translation model comprising: obtaining a set of training samples like a set of real samples belonging to Domain A and a set of ground truth samples belonging to a Domain B, processing the down-sampled images with the deep image-to-image translation network to generate a set of fake sample, inputting the set of images to the discriminator for distinguishing real pairs from fake pairs, and correcting the generator and the discriminator taught by Ji.
The suggestion/motivation for doing so would have been to improve the performance of generated saliency mask and the semantic segmentation accuracy (3. Method, 3.3. Densely Conditional Random Field (DCRF) inference, "Recent works [4,28] have shown that complementing CNNs with fully connected conditional random fields (CRFs) can significantly enhance the semantic segmentation accuracy. Inspired by the success of densely connected CRF in enhancing the final result of pixel-level annotation task, we applied densely connected CRF inference in to the image-to-saliency mask translation task in order to further improve the performance of generated saliency mask"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination of Fu and Ji does not expressly teach and a set of first deformation parameters, wherein each first deformation parameter in the set of first deformation parameters corresponds to a respective pixel of the first image in the set of first images respectively.
However, FLETCHER teaches and a set of first deformation parameters, wherein each first deformation parameter in the set of first deformation parameters corresponds to a respective pixel of the first image in the set of first images respectively ([0066] the image distortion map itself may also be represented in a down-sampled form in which the mapping is not read directly from the image distortion map, but is interpolated from values in a low-resolution image distortion map; [0067] At step 203 the moving image m is warped based on the image distortion map by mapping positions from every pixel coordinate in the fixed image f to coordinates in the moving image m. Subsequently, pixel values are interpolated in the moving image m using the mapped positions, and a distorted image is constructed of the same size as the fixed image f that contains interpolated values from the moving image m; [0070] the image distortion map is calculated based on an image pyramid including a list of successively down-sampled images. The lowest resolution images are registered first, which produces a low-resolution image distortion estimate. A rigid transformation is a function modelled by a small number of parameters, such as a pure translation, a rotation/scaling/translation transform; [0071] One method for performing rigid registration includes estimating rotation, scaling and translation by calculating Fourier magnitude images of the selected down-sampled moving and fixed image).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu and Ji to incorporate the step/system of calculating the image distortion map which is interpolated from values in a low-resolution image distortion map taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu and Ji with FLETCHER to obtain the invention as specified in claim 9.
Regarding claim 12, Fu, Ji and FLETCHER teach all the limitations of claim 9 above. Fu does not expressly teach wherein each first image in the set of first images matches a corresponding second image in the set of second images.
  However, Ji teaches wherein each first image in the set of first images matches a corresponding second image in the set of second images (Section 3. Method, 3.2. Objective function, "the discriminator in conditional GAN aims at distinguishing real pairs from fake pairs. Thus, positive samples should be the real image-to-ground-truth saliency mask pairs, and negative samples should be the image-to-generated saliency map pairs").
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu to incorporate the step/system of paring up real samples with fake samples taught by Ji.
The suggestion/motivation for doing so would have been to improve the performance of generated saliency mask and the semantic segmentation accuracy (3. Method, 3.3. Densely Conditional Random Field (DCRF) inference, "Recent works [4,28] have shown that complementing CNNs with fully connected conditional random fields (CRFs) can significantly enhance the semantic segmentation accuracy. Inspired by the success of densely connected CRF in enhancing the final result of pixel-level annotation task, we applied densely connected CRF inference in to the image-to-saliency mask translation task in order to further improve the performance of generated saliency mask"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Fu and Ji with FLETCHER to obtain the invention as specified in claim 12.
With respect to claim 10, arguments analogous to those presented for claim 2, are applicable.
With respect to claim 14, arguments analogous to those presented for claim 1, are applicable.
With respect to claim 15, arguments analogous to those presented for claim 2, are applicable.
With respect to claim 16, arguments analogous to those presented for claim 3, are applicable.
With respect to claim 20, arguments analogous to those presented for claim 1, are applicable.




8.	Claim 4-6 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Fu et al. (U.S Publication No. 2019/0295302) (hereafter, "Fu") view of Ji et al. (NPL 1, “Saliency detection via conditional adversarial image-to-image network”) (hereafter, "Ji") and further in view of FLETCHER (U.S Publication No. 2016/0350893) and Yi et al. (NPL 2, “DualGAN: Unsupervised Dual Learning for Image-to-Image Translation”) (hereafter, "Yi").
Regarding claim 4, Fu, Ji and FLETCHER teach all the limitations of claim 3 above. Ji teaches recognizing the original image to determine a second domain to which the original image belongs (FIG. 1 shows recognizing the Real Sample to determine the Domain B to which the Real Sample belongs).
Ji dose not expressly teach further comprising: in cases that the first domain corresponds to a plurality of first generators, and obtaining the first target generator from the plurality of first generators based on the second domain and the first domain.
However, Yi teaches in cases that the first domain corresponds to a plurality of first generators, and obtaining the first target generator from the plurality of first generators based on the second domain and the first domain (Figure 1; 3. Method, "Given two sets of unlabeled and unpaired images sampled from domains U and V, respectively, the primal task of DualGAN is to learn a generator GA : U --> V that maps an image u ϵ U to an image v ϵ V , while the dual task is to train an inverse generator GB : V --> U. To realize this, we employ two GANs, the primal GAN and the dual GAN. The primal GAN learns the generator GA and a discriminator DA that discriminates between GA’s fake outputs and real members of domain V").
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Ji to incorporate the step/system of using multiple generators based on multiple domain taught by Yi.
The suggestion/motivation for doing so would have been to improve the preservation of content structures in the inputs and capture features (3.1. Objective, "It is proven that the former performs better in terms of generator convergence and sample quality, as well as in improving the stability of the optimization"; 5. Qualitative evaluation, "Compared to GAN, in almost all cases, DualGAN produces results that are less blurry, contain fewer artifacts, and better preserve content structures in the inputs and capture features (e.g., texture, color, and/or style) of the target domain. We attribute the improvements to the reconstruction loss, which forces the inputs to be reconstructable from outputs through the dual generator and strengthens feedback signals that encodes the targeted distribution"; 6. Conclusion, "Experimental results suggest that the DualGAN mechanism can significantly improve the outputs of GAN for various image-to-image translation tasks"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Ji with Yi to obtain the invention as specified in claim 4.
Regarding claim 5, Fu, Ji and FLETCHER teach all the limitations of claim 1 above. Ji teaches further comprising: recognizing the original image to determine a second domain to which the original image belongs (FIG. 1 shows recognizing the Real Sample to determine the Domain B to which the Real Sample belongs); to generate the pre-translated image (Fig. 1, “Generated fake sample”), the mask image (Fig. 1, “Ground-Truth”; Section 3. Method, 3.2. Objective function, “Specifically, in image-to-saliency translation task, y should be the ground-truth saliency mask”).
Ji dose not expressly teach obtaining a second target generator based on the second domain; and processing the down-sampled image with the second target generator, and the deformation parameters.
However, Yi teaches obtaining a second target generator based on the second domain (Figure 1; 3. Method, "Given two sets of unlabeled and unpaired images sampled from domains U and V , respectively, the primal task of DualGAN is to learn a generator GA : U --> V that maps an image u ϵ U to an image v ϵ V , while the dual task is to train an inverse generator GB : V --> U."); and processing the down-sampled image with the second target generator (3.2. Network configuration, “DualGAN is constructed with identical network architecture for GA and GB. The generator is configured with equal number of downsampling (pooling) and upsampling layers”).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Ji to incorporate the step/system of obtaining a second generator (GB) based on the second domain and processing the down-sampled image with the second generator taught by Yi.
The suggestion/motivation for doing so would have been to improve the preservation of content structures in the inputs and capture features (3.1. Objective, "It is proven that the former performs better in terms of generator convergence and sample quality, as well as in improving the stability of the optimization"; 5. Qualitative evaluation, "Compared to GAN, in almost all cases, DualGAN produces results that are less blurry, contain fewer artifacts, and better preserve content structures in the inputs and capture features (e.g., texture, color, and/or style) of the target domain. We attribute the improvements to the reconstruction loss, which forces the inputs to be reconstructable from outputs through the dual generator and strengthens feedback signals that encodes the targeted distribution"; 6. Conclusion, "Experimental results suggest that the DualGAN mechanism can significantly improve the outputs of GAN for various image-to-image translation tasks"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results.
The combination of Ji and Yi does not expressly teach and the deformation parameters.
However, FLETCHER teaches and the deformation parameters ([0066] the image distortion map … interpolated from values in a low-resolution image distortion map; [0070] small number of parameters, such as a pure translation).
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Fu and Ji to incorporate the step/system of having (using) the deformation parameters taught by FLETCHER. 
The suggestion/motivation for doing so would have been to improve the accuracy of the image distortion maps ([0014] The method further includes combining the current image distortion map and the residual image distortion map to form an improved distortion map, wherein the improved distortion map is used to transform the target image for displaying on a display device; [0081] a loop is initialized to calculate progressively more accurate image distortion maps w using the sequence of down-sampled images). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Ji and Yi with FLETCHER to obtain the invention as specified in claim 5.
Regarding claim 6, Fu, Ji and FLETCHER teach all the limitations of claim 5 above. Ji teaches obtaining a first domain to which the target translation image belongs (Section 3. Method, 3.2. Objective function, “The objective function of a conditional GAN (cGAN) can be expressed as: LcGAN (G, D ) = E x,y ∼p data (x,y) [ logD (x, y )] + E x,y ∼p data (x,y ) ,z∼p z (z) [ log(1 −D (x, G (x, z)))], (3) where y denotes the target-domain image. Specifically, in image- to-saliency translation task, y should be the ground-truth saliency mask”).
Ji dose not expressly teach further comprising: in cases that the second domain corresponds to a plurality of second generators; and obtaining the second target generator from the plurality of second generators based on the first domain and the second domain.
However, Yi teaches further comprising: in cases that the second domain corresponds to a plurality of second generators (Figure 1; 3. Method, "Given two sets of unlabeled and unpaired images sampled from domains U and V , respectively, the primal task of DualGAN is to learn a generator GA : U --> V that maps an image u ϵ U to an image v ϵ V , while the dual task is to train an inverse generator GB : V --> U"), and obtaining the second target generator from the plurality of second generators based on the first domain and the second domain (Figure 1; 3. Method, "Given two sets of unlabeled and unpaired images sampled from domains U and V , respectively, the primal task of DualGAN is to learn a generator GA : U --> V that maps an image u ϵ U to an image v ϵ V , while the dual task is to train an inverse generator GB : V --> U. To realize this, we employ two GANs, the primal GAN and the dual GAN. The primal GAN learns the generator GA and a discriminator DA that discriminates between GA’s fake outputs and real members of domain V").
It would have been obvious before the effective filing date of the claimed invention to one having ordinary skill in the art to modify the device and method of Ji to incorporate the step/system of having the second domain corresponds to the second generator and obtaining the second generator based on the first domain and the second domain taught by Yi.
The suggestion/motivation for doing so would have been to improve the preservation of content structures in the inputs and capture features (3.1. Objective, "It is proven that the former performs better in terms of generator convergence and sample quality, as well as in improving the stability of the optimization"; 5. Qualitative evaluation, "Compared to GAN, in almost all cases, DualGAN produces results that are less blurry, contain fewer artifacts, and better preserve content structures in the inputs and capture features (e.g., texture, color, and/or style) of the target domain. We attribute the improvements to the reconstruction loss, which forces the inputs to be reconstructable from outputs through the dual generator and strengthens feedback signals that encodes the targeted distribution"; 6. Conclusion, "Experimental results suggest that the DualGAN mechanism can significantly improve the outputs of GAN for various image-to-image translation tasks"). Further, one skilled in the art could have combined the elements as described above by known method with no change in their respective functions, and the combination would have yielded nothing more than predicted results. Therefore, it would have been obvious to combine Ji with Yi to obtain the invention as specified in claim 6.
With respect to claim 17, arguments analogous to those presented for claim 4, are applicable.
With respect to claim 18, arguments analogous to those presented for claim 5, are applicable.
With respect to claim 19, arguments analogous to those presented for claim 6, are applicable.

Allowable Subject Matter
9.      Claims 7, 11 and 13 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.
Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to DANIEL C. CHANG whose telephone number is (571)270-1277. The examiner can normally be reached Monday-Thursday and Alternate Fridays 8:00-5:00.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chan S. Park can be reached on (571) 272-7409. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DANIEL CHANG/
Patent Examiner, Art Unit 2669
/CHAN S PARK/Supervisory Patent Examiner, Art Unit 2669