DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 13 and 22 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng (2018 ECCV) in view of Konstantinos et al (arXiv 1608.06019).
-Regarding claim 13, Zheng discloses a method of refining image acquisition data through domain adaptation (Abstract; FIG. 1; Page 4, section 3, 3rd paragraph, “domain adaptation”), the method comprising: converting, by a cross-domain encoder (FIG. 2,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ), real image acquisition data into a compact real feature representation (FIG. 2, lower portion, features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ; Page 7, 1st paragraph, “                        
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                     …”); converting, by a real encoder (FIG. 2, lower portion, encoder of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ), conditional real data into a conditional real depth feature (Fig. 2, lower portion; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”); and generating, by a synthetic decoder (FIG. 2, upper portion, decoder of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ), a refined version of the real image acquisition data based on the conditional real data (FIG. 2, Real Depth Prediction, Real2Real Image), Fig. 2,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                            ,
                             
                            
                                
                                    D
                                
                                
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    D
                                
                                
                                    f
                                    e
                                    a
                                    t
                                
                            
                        
                    ; features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ; Page 7, 1st paragraph, “                        
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                     …”) to the refined version of real image acquisition data (FIG. 2, Real Depth Prediction, Real2Real Image) conditioned on the conditional real depth feature (FIG. 2, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “both pipelines share identical weights for the                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                     network, and likewise for the                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     network … the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”).
Zheng does not disclose the compact real feature representation from the cross- domain encoder passed through to the synthetic decoder to generate the refined version of the real image acquisition data.
In the same field of endeavor, Konstantinos teaches a domain separation network by jointed modeling both private and shared components of the domain representations (Konstantinos: Abstract; Figure 1 
    PNG
    media_image1.png
    517
    769
    media_image1.png
    Greyscale
). Konstantinos further teaches compact real feature representation from a cross-domain encoder passed through to a synthetic decoder to generate a refined version of real image acquisition data (Konstantinos: Figure 1, Shared Encoder, Private Source Encoder, Shared Decoder).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Konstantinos by passing the compact real feature representation from the cross- domain encoder to a synthetic decoder to generate the refined version of the real image acquisition data in order to improve the quality of refined version of image acquisition data by learning and extracting image representations from both cross-domain and synthetic domain.
-Regarding claim 22, Zheng discloses without skip links connecting the cross-domain encoder with the synthetic decoder (FIG. 2; Page 7, Section 3.4, 2nd paragraph). Zheng does not disclose the compact real feature representation from the cross- domain encoder passed through to the synthetic decoder to generate the refined version of the real image acquisition data.
In the same field of endeavor, Konstantinos teaches a domain separation network by jointed modeling both private and shared components of the domain representations (Konstantinos: Abstract; Figure 1). Konstantinos further teaches compact real feature representation from a cross-domain encoder passed through to a synthetic decoder to generate a refined version of real image acquisition data (Konstantinos: Figure 1, Shared Encoder, Private Source Encoder, Shared Decoder).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Konstantinos by passing the compact real feature representation from the cross- domain encoder to a synthetic decoder to generate the refined version of the real image acquisition data in order to improve the quality of refined version of image acquisition data by learning and extracting image representations from both cross-domain and synthetic domain.
Claims 1, 3-6, 8-12, 15-16 and 21 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng (2018 ECCV) in view of Dixit et al (U.S PATENT NO. 1030772 B2), and further in view of Konstantinos et al (arXiv 1608.06019).
-Regarding claim 1, Zheng discloses a cross domain supervised learning-based system for Abstract; Fig. 1-2; Page 4, Section 3 Method), the system comprising: a cross-domain encoder (Fig. 2,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ; Page 5, 3rd paragraph; Page 6, subsection 3.2, 4th paragraph, “encoder-decoder network of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ”) configured to convert synthetic and real image acquisition data into compact synthetic and real feature representations respectively (Fig. 2; Page 7, 1st paragraph, “where                         
                            
                                
                                    f
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                     and                         
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                      are features obtained by the encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     for translated-synthetic images and real images respectively”; features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ); a synthetic conditional depth prediction branch network (Fig. 2, upper portion; Page 5, 3rd paragraph) including: a synthetic encoder (Fig.2, upper portion, encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                            ,
                             
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) configured to convert conditional synthetic image data associated with the synthetic image acquisition data (Fig.2) into a conditional synthetic depth feature (Fig. 2; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”); a synthetic decoder (Fig. 2, upper portion, decoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                            ,
                             
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) configured to convert the compact synthetic feature representation (Fig. 2; Page 7, 1st paragraph, “where                         
                            
                                
                                    f
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                    …”; features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) to a refined version of the synthetic image acquisition data (Fig. 2, Synthetic Depth Prediction, Syn2Real Image) conditioned on the conditional synthetic depth feature (Fig. 2, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”), Fig. 2, lower portion; Page 5, 3rd paragraph) including: a real encoder (Fig.2, lower portion, encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ,                        
                             
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) configured to convert conditional real image data associated with the real image acquisition data (Fig.2) into a conditional real depth feature (Fig. 2; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”); a real decoder (Fig.2, lower portion, decoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) configured to convert the compact real feature representation (Fig. 2; Page 7, 1st paragraph, “                        
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                     …”; features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) to a refined version of the real image acquisition data (Fig. 2, Real Depth Prediction, Real2Real Image) conditioned on the conditional real depth feature (Fig. 2, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”); and a training supervision element configured to iteratively train the domain adaptive refinement agent based on the refined versions synthetic and real image acquisition data (Abstract, “training input is real … synthetic image-depth pairs”; FIG. 2, Task loss, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “twin pipeline training framework”; Page 4, section 3, 2rd paragraph, “domain adaptation”; Page 5, subsection 3.1, “Adversarial Loss”; Page 6, subsection 3.2, “Task Loss”; Page 7, subsection 3.3, “Full Objective”).
	Zheng is silent to teach an iteratively training. However, one of ordinary skill in the art would understand that it has to be an iteration process for any training or domain adaptation under given loss or cost function.
Nonetheless, in the same field of endeavor, Dixit teaches a training supervision element configured to have iteratively training for a cross-domain adaptation (Dixit: Col. 6, lines 20-28, “The domain transfer … trained by iterating …”; FIGS. 2-4, 8, 14; Col. 5, lines 24-28, “domain adaptation component”; Col. 7, lines 29-42, “refinement … supervised”; equations (14)-(16)).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a training supervision element configured to iteratively train the domain adaptive refinement agent based on the refined versions synthetic and real image acquisition data in order to achieve better performance of refinement of image acquisition data.
Zheng in view of Dixit does not teach the compact synthetic feature representation from the cross-domain encoder passed through to the synthetic decoder.
However, Konstantinos is an analogous art pertinent to the problem to be solved in this application and further teaches a domain separation network by jointed modeling both private and shared components of the domain representations (Konstantinos: Abstract; Figure 1). Konstantinos further teaches compact synthetic feature representation from the cross-domain encoder passed through to the synthetic decoder (Konstantinos: Figure 1, Shared Encoder, Private Target Encoder, Shared Decoder).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng in view of Dixit with the teaching of Konstantinos by passing compact synthetic feature representation from the cross-domain encoder to synthetic decoder in order to improve the quality of refined version of image acquisition data by learning and extracting image representations from both cross-domain and synthetic domain.
-Regarding claim 3, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
The modification further discloses to compare the refined real image acquisition data (Zheng: FIG. 2, Real2Real Image) to the real image acquisition data (FIG. 2, Real Image); calculate a real domain loss based on the comparison (FIG.2, Reconstruction loss); feed the real domain loss (FIG. 2, GAN loss, Reconstruction loss) to the real conditional depth prediction branch network (FIG.2, lower portion) to update parameters of the real encoder and the real decoder (FIG.2, lower portion, encoder and decoder of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    , equations (1)-(2)); and feed the real domain loss to the cross-domain encoder to update parameters of the cross-domain encoder (FIG. 2, lower portion, Reconstruction loss, encoder of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                            )
                        
                     .
-Regarding claim 4, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
Zheng is silent to teach a synthetic concatenation element configured to concatenate the compact synthetic feature representation with the conditional synthetic depth feature.
In the same field of endeavor, Dixit teaches a feature concatenation element configured to concatenate the compact feature representation with the conditional depth feature (Dixit: Col. 8, lines 5-35, “concatenated appearance and shape parameters and decoding this latent representation”; equations (17)-(18); FIGS. 2-4, 8, 13A-13C).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a feature concatenation element in order to perform disentangling and combining to solve learning problem using unpaired data.
-Regarding claim 5, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
Zheng is silent to teach a real concatenation element configured to concatenate the compact feature representation with the real conditional depth feature.
In the same field of endeavor, Dixit teaches a feature concatenation element configured to concatenate the compact feature representation with the conditional depth feature (Dixit: Col. 8, lines 5-35, “concatenated appearance and shape parameters and decoding this latent representation”; equations (17)-(18); FIGS. 2-4, 8, 13A-13C).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a feature concatenation element in order to perform disentangling and combining to solve learning problem using unpaired data.
-Regarding claim 6, Zheng in view of Dixit, and further in view of Konstantinos discloses the method of claim 1.
The modification further discloses wherein the conditional synthetic data is of a different modality (Zheng: FIG. 2, Synthetic Image) and of a higher quality than the corresponding synthetic image acquisition data (Zheng: FIG. 2, Syn2Real Image), and wherein the conditional real data is of a different modality (Zheng: FIG. 2, Real Image) and of a higher quality than the corresponding real image acquisition data (Zheng: FIG. 2, Real2Real Image), wherein the different modality comprises images or video frames (Zheng: FIG. 2, Synthetic Image, Real Image; Page 10, subsection 4.3, 1st paragraph, “paired frames”).
-Regarding claim 8, Zheng in view of Dixit discloses the system of claim 1.
Zheng is silent to teach wherein the synthetic and real image acquisition data comprise one or more of depth maps, optical flows, normal maps, or segmentation maps.
in the same field of endeavor, Dixit teaches wherein the synthetic and real image acquisition data comprise one or more of depth maps, optical flows, normal maps, or segmentation maps (Dixit: FIG. 2,                         
                            
                                
                                    S
                                
                                
                                    0
                                
                            
                        
                    ; FIG. 3, Depth domain 312).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using synthetic and real image acquisition data that comprise one or more of depth maps, optical flows, normal maps, or segmentation maps in order to perform cross-domain adaptation and refine low quality image acquisition data with a variety of domains.
-Regarding claim 9, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
The modification further discloses wherein the real encoder and the synthetic encoder comprise a single encoder (Zheng: FIG. 2, both upper and lower portions, encoder of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ).
-Regarding claim 10, Zheng in view of Dixit discloses the system of claim 1.
The modification further discloses wherein the synthetic conditional depth prediction branch network comprises a convolutional neural network with skip links that connect outputs of convolutional layers in the synthetic encoder to inputs of the convolutional layers of the synthetic decoder (Zheng: Abstract, “GAN”; FIG. 2; Page 7, section 3.4, 2nd paragraph, “skip connections”; Page 5, 3rd paragraph).
-Regarding claim 11, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
The modification further discloses wherein the real conditional depth prediction branch network comprises a convolutional neural network with skip links that connect outputs of convolutional layers in the real encoder to inputs of the convolutional layers of the real decoder (Zheng: Abstract, “GAN”; FIG. 2; Page 7, section 3.4, 2nd paragraph, “skip connections”; Page 5, 3rd paragraph, “both pipelines share identical weights for the                        
                             
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                             
                        
                    network, and likewise for the                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     network … feature-based GAN”).
-Regarding claim 12, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
The modification further discloses  wherein the synthetic conditional depth prediction branch network is configured to limit the size of the compact synthetic feature representation (Zheng: Page 7, section 3.4, 1st paragraph, “limited … down-sampling layer … 6 blocks”; 2nd paragraph, “multiple dilation convolution”), and wherein the real conditional depth prediction branch network is configured to limit the size of the compact real feature representation (Zheng: Page 5, 3rd paragraph, “both pipelines share identical weights for the                        
                             
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                             
                        
                    network, and likewise for the                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     network … feature-based GAN”’).
-Regarding claim 15, Zheng discloses the method of claim 13.
Zheng is silent to teach wherein transferring the real image acquisition data to the synthetic domain includes: concatenating the compact real feature representation and the conditional real depth feature resulting in a concatenated feature vector; and feeding the concatenated feature vector to a synthetic decoder.
In the same field of endeavor, Dixit teaches wherein transferring the real image acquisition data to the synthetic domain includes: concatenating the compact feature representation and the conditional depth feature resulting in a concatenated feature vector; and feeding the concatenated feature vector to a decoder (Dixit: Col. 8, lines 5-35, “The decoder combines these parameters into a reconstruction on its output by taking a vector of concatenated appearance and shape parameters and decoding this latent representation into an image”; equations (17)-(18); FIGS. 2-4, 8, 13A-13C).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by concatenating the compact real feature representation and the conditional real depth feature resulting in a concatenated feature vector; and feeding the concatenated feature vector to a synthetic decoder in order to perform disentangling and combining to solve learning problem using unpaired data.
-Regarding claim 16, Zheng discloses the method of claim 13.
Zheng discloses wherein the conditional real data is of a different modality and of a higher quality than the corresponding real image acquisition data, and wherein the different modality comprises images or video frames (FIG. 2).
Zheng is silent to teach wherein the real image acquisition data comprises a depth map, an optical flow, a normal map, or a segmentation map.
in the same field of endeavor, Dixit teaches wherein the synthetic and real image acquisition data comprise one or more of depth maps, optical flows, normal maps, or segmentation maps (Dixit: FIG. 2,                         
                            
                                
                                    S
                                
                                
                                    0
                                
                            
                        
                    ; FIG. 3, Depth domain 312).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using synthetic and real image acquisition data that comprise one or more of depth maps, optical flows, normal maps, or segmentation maps in order to perform cross-domain adaptation and refine low quality image acquisition data with a variety of domains.
-Regarding claim 21, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
Zheng in view of Dixit discloses without skip links connecting the cross-domain encoder with the synthetic decoder (Zheng: FIG. 2; Page 7, Section 3.4, 2nd paragraph). Zheng does not disclose the compact real feature representation from the cross- domain encoder passed through to the synthetic decoder to generate the refined version of the real image acquisition data.
Zheng in view of Dixit does not teach the compact synthetic feature representation from the cross-domain encoder passed through to the synthetic decoder.
However, Konstantinos is an analogous art pertinent to the problem to be solved in this application and further teaches a domain separation network by jointed modeling both private and shared components of the domain representations (Konstantinos: Abstract; Figure 1). Konstantinos further teaches compact synthetic feature representation from the cross-domain encoder passed through to the synthetic decoder (Konstantinos: Figure 1, Shared Encoder, Private Target Encoder, Shared Decoder).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng in view of Dixit with the teaching of Konstantinos by passing compact synthetic feature representation from the cross-domain encoder to synthetic decoder in order to improve the quality of refined version of image acquisition data by learning and extracting image representations from both cross-domain and synthetic domain.
Claims 2 and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng (2018 ECCV) in view of Dixit et al (U.S PATENT NO. 1030772 B2), and further in view of Konstantinos et al (arXiv 1608.06019), in view of Dundar et al (U.S PG-PUB NO. 20190244060 A1).
-Regarding claim 2, Zheng in view of Dixit, and further in view of Konstantinos discloses the system of claim 1.
Zheng discloses to compare the refined synthetic image depth feature (FIG. 2, Synthetic Depth Prediction) to ground truth synthetic image depth feature (FIG. 2, Ground Truth Synthetic Depth”); calculate a synthetic domain loss based on the comparison (FIG. 2, Task loss; Page 5, 3rd paragraph; Page 6, section 3.2); feed the synthetic domain loss to the synthetic conditional depth prediction branch network to update parameters of the synthetic encoder and the synthetic decoder (Page 5, 3rd paragraph, “trained end-to-end, with the weights of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     simultaneously optimized”, section 3.1; Page 6, section 3.2); and feed the synthetic domain loss to the cross-domain encoder to update parameters of the cross-domain encoder (Page 7, section 3.3; equations (1)-(7)).
Zheng is silent to teach comparing refined synthetic image acquisition with ground truth synthetic image acquisition data and feed the synthetic domain loss to update parameters of the synthetic encoder and the synthetic decoder.
In the same field of endeavor, Dixit teaches a training supervision element configured to have iteratively training for a cross-domain adaptation (Dixit: Col. 6, lines 20-28, “The domain transfer … trained by iterating …”; FIGS. 2-4, 8-9, 14; Col. 5, lines 24-28, “domain adaptation component”; Col. 7, lines 29-42, “refinement … supervised”; equations (14)-(16)). Dixit further teaches converting refined depth map to a refined version of image data (Dixit: FIG. 2, identity recovery network 212,                         
                            
                                
                                    x
                                
                                
                                    p
                                
                            
                        
                    ).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a training supervision element configured to iteratively train the domain adaptive refinement agent based on the refined versions synthetic and real image acquisition data in order to achieve better performance of refinement of image acquisition data.
Zheng in view of Dixit, and further in view of Konstantinos is silent teach comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss.
However, Dundar is an analogous art pertinent to the problem to be solved in this application and further discloses comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss (Dundar: FIG. 6D, ground truth recognition data, predicted synthetic recognition data, training loss 645; [0136]; [0147]-[0150]; FIG. 6G).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Zheng in view of Dixit, and further in view of Konstantinos with the teaching of Dundar by using identity recovery network and comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss in order to provide alternative way to train the domain adaptive refinement agent to achieve better performance of refinement.
Claims 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Zheng (2018 ECCV) in view of Dixit et al (U.S PATENT NO. 1030772 B2), and further in view of Dundar et al (U.S PG-PUB NO. 20190244060 A1).
-Regarding claim 17, Zheng  discloses a supervised learning-based method of iteratively training a domain adaptive refinement agent (Abstract; Fig. 1-2; Page 4, Section 3 Method), the method comprising: feeding synthetic image acquisition data (FIG. 2, and real image acquisition data to a cross-domain depth encoder (Fig. 2,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ; Page 5, 3rd paragraph; Page 6, subsection 3.2, 4th paragraph, “encoder-decoder network of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ”) to convert the synthetic image acquisition data and the real image acquisition data (FIG. 2, Syn2Real Image, Real2Real Image) to a compact synthetic feature representation and a compact real feature representation, respectively (Fig. 2; Page 7, 1st paragraph, “where                         
                            
                                
                                    f
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                     and                         
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                      are features obtained by the encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     for translated-synthetic images and real images respectively”; features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ); feeding the compact synthetic feature representation (FIG.2; Page 7, 1st paragraph,  “                        
                            
                                
                                    f
                                
                                
                                    
                                        
                                            
                                                
                                                    x
                                                
                                                ^
                                            
                                        
                                        
                                            s
                                        
                                    
                                
                            
                        
                    ”; (features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    )) and conditional synthetic image data (Fig.2, Synthetic Image) to a synthetic conditional depth prediction branch network (FIG. 2, upper portion) to generate a refined version of the synthetic image acquisition data (Fig. 2, Synthetic Depth Prediction, Syn2Real Image) conditioned on the conditional synthetic image data (Fig. 2, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”), the compact synthetic feature representation being decoded, by a synthetic decoder of the synthetic conditional depth decoder branch (FIG. 2, upper portion, decoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    , ; features obtained by encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ), to generate the refined version of the synthetic image acquisition data conditioned on the conditional synthetic depth feature  (FIG. 2, Synthetic Image Prediction, Syn2Real Image,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    D
                                
                                
                                    R
                                
                            
                        
                    , GAN loss), wherein no skip links connect the cross-domain depth encoder with the synthetic decoder (FIG. 2); feeding the compact real feature representation (Fig. 2; Page 7, 1st paragraph, “                        
                            
                                
                                    f
                                
                                
                                    
                                        
                                            x
                                        
                                        
                                            r
                                        
                                    
                                
                            
                        
                     …”; (features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    )) and conditional real image data (Fig.2, Real Image) to a real conditional depth prediction branch network (FIG.2, lower portion) to generate a refined version of the real image acquisition data (Fig. 2, Real Depth Prediction, Real2Real Image) conditioned on the conditional real image data (Fig. 2, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “the inner feature representations of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                     … through a feature-based GAN via Dfeat”); comparing the refined version of the synthetic image acquisition data (FIG. 2, Synthetic Depth Prediction) to ground truth synthetic image acquisition data (FIG. 2, Ground Truth Synthetic Depth) to calculate a synthetic domain loss (FIG. 2, Task loss; Page 5, 3rd paragraph; Page 6, section 3.2) and the refined version of the real image acquisition data (FIG. 2, Real2Real Image) to the real image acquisition data (FIG. 2, Real Image) to calculate a real domain loss (Fig. 2, Reconstruction loss); and updating network parameters of the cross-domain depth encoder (FIG. 2, encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ) and the synthetic conditional depth prediction branch network (FIG. 2, upper portion) based on the synthetic domain loss (FIG. 2, Task loss) and network parameters of the cross-domain depth encoder (FIG. 2, encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ) and the real conditional depth prediction branch network (FIG. 2, lower portion)  based on the real domain loss (FIG. 2, GAN loss, Reconstruction loss) to iteratively train the domain adaptive refinement agent (Abstract, “training input is real … synthetic image-depth pairs”; FIG. 2, Task loss, GAN loss, Reconstruction loss; Page 5, 3rd paragraph, “twin pipeline training framework”; Page 4, section 3, 2rd paragraph, “domain adaptation”; Page 5, subsection 3.1, “Adversarial Loss”; Page 6, subsection 3.2, “Task Loss”; Page 7, subsection 3.3, “Full Objective”; equations (1)-(7)).
Zheng is silent to teach an iteratively training. However, one of ordinary skill in the art would understand that it has to be an iteration process for any training or domain adaptation under given loss or cost function. Zheng is silent to teach comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss.
In the same field of endeavor, Dixit teaches a training supervision element configured to have iteratively training for a cross-domain adaptation (Dixit: Col. 6, lines 20-28, “The domain transfer … trained by iterating …”; FIGS. 2-4, 8, 14; Col. 5, lines 24-28, “domain adaptation component”; Col. 7, lines 29-42, “refinement … supervised”; equations (14)-(16)). Dixit further teaches converting refined depth map to a refined version of image data (Dixit: FIG. 2, identity recovery network 212,                         
                            
                                
                                    x
                                
                                
                                    p
                                
                            
                        
                    ).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a training supervision element configured to iteratively train the domain adaptive refinement agent based on the refined versions synthetic and real image acquisition data in order to achieve better performance of refinement of image acquisition data.
Zheng in view of Dixit is silent teach comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss.
However, Dundar is an analogous art pertinent to the problem to be solved in this application and further discloses comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss (Dundar: FIG. 6D, ground truth recognition data, predicted synthetic recognition data, training loss 645; [0136]; [0147]-[0150]; FIG. 6G).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to modify the teaching of Zheng in view of Dixit with the teaching of Dundar by using identity recovery network and comparing the refined version of the synthetic image acquisition data to ground truth synthetic image acquisition data to calculate a synthetic domain loss in order to provide alternative way to train the domain adaptive refinement agent to achieve better performance of refinement.
Zheng in view of Dixit, and further in view of Dundar
-Regarding claim 18, Zheng in view of Dixit, and further in view of Dundar discloses the method of 17.
The modification further discloses receiving the synthetic image acquisition data and the corresponding conditional synthetic image data; and receiving the real image acquisition data and the corresponding conditional real image data (Zheng: FIG. 2).
-Regarding claim 19, Zheng in view of Dixit, and further in view of Dundar discloses the method of 18.
Zheng discloses encoding, by a synthetic encoder of the synthetic conditional depth decoder branch network (FIG. 2, upper portion, encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ), the conditional synthetic image data (FIG. 2, Synthetic Image) into a conditional synthetic depth feature (features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                            )
                        
                    .
Zheng is silent to teach concatenating, by a synthetic concatenation element of the synthetic conditional depth decoder branch network, the compact synthetic feature representation with the conditional synthetic depth feature.
In the same field of endeavor, Dixit teaches a feature concatenation element configured to concatenate the compact feature representation with the conditional depth feature (Dixit: Col. 8, lines 5-35, “concatenated appearance and shape parameters and decoding this latent representation”; equations (17)-(18); FIGS. 2-4, 8, 13A-13C).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a feature concatenation element in order to perform disentangling and combining to solve learning problem using unpaired data.
-Regarding claim 20, Zheng in view of Dixit, and further in view of Dundar discloses the method of 18.
Zheng discloses encoding, by a real encoder of the real conditional depth prediction branch network (FIG. 2, lower portion, encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ),  the conditional real image data (FIG. 2, Real Image) into a conditional real depth feature (features obtained by encoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                            )
                        
                    ; and decoding, by a real decoder of the real conditional depth prediction branch network (FIG. 2, lower portion, decoder portion of                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ),  the compact real feature representation (FIG. 2,  features obtained by encoder portion of                         
                            
                                
                                    f
                                
                                
                                    T
                                
                            
                        
                    ) into the refined version of the real image acquisition data (FIG. 2, Real Image Prediction, Real2Real Image) conditioned on the conditional real depth feature (FIG. 2,                         
                            
                                
                                    G
                                
                                
                                    S
                                    →
                                    R
                                
                            
                        
                    ,                         
                            
                                
                                    D
                                
                                
                                    R
                                
                            
                        
                    , GAN loss, Reconstruction loss).
Zheng is silent to teach concatenating, by a real concatenation element of the real conditional depth prediction branch network, the compact real feature representation with the conditional real depth feature.
In the same field of endeavor, Dixit teaches a feature concatenation element configured to concatenate the compact feature representation with the conditional depth feature (Dixit: Col. 8, lines 5-35, “concatenated appearance and shape parameters and decoding this latent representation”; equations (17)-(18); FIGS. 2-4, 8, 13A-13C).
Therefore, it would have been obvious to one of ordinary skills in the art before the effective filing date of the claimed invention to combine the teaching of Zheng with the teaching of Dixit by using a feature concatenation element in order to perform disentangling and combining to solve learning problem using unpaired data.
Response to Arguments
Applicant's arguments filed 08/03/2022 have been fully considered. 
Applicant’s arguments with respect to claims 1 and 13  have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument.
Applicant’s arguments with respect to claim 17 have been considered but they are not persuasive. Applicant argues that Zheng, Dixit, and Dundar do not teach at least the amendment “the compact synthetic feature representation being decoded, by a synthetic decoder of the synthetic conditional depth decoder branch, to generate the refined version of the synthetic image acquisition data conditioned on the conditional synthetic depth feature, wherein no skip links connect the cross- domain depth encoder with the synthetic decoder”.
The examiner respectfully disagrees.
The claim limitations “the compact synthetic feature representation being decoded, by a synthetic decoder of the synthetic conditional depth decoder branch, to generate the refined version of the synthetic image acquisition data conditioned on the conditional synthetic depth feature” in claim 17 are similar to those cancelled claim limitations of claim 19. See Non-Final Office Action dated 05/09/2022, Page 17. Regarding to claim limitation “wherein no skip links connect the cross-domain depth encoder with the synthetic decoder”, it is obvious that no skip links connect the cross-domain depth encoder with the synthetic decoder (Zheng: FIG. 2). Zheng also discloses that the disclosed             
                
                    
                        G
                    
                    
                        S
                        →
                        R
                    
                
            
        ,             
                
                    
                        f
                    
                    
                        T
                    
                
            
         are residual network similar to SimGAN (He et al, IEEE conf. on CVPR 2016) and Loss module shown in Godard et al, Page 3, Figure 2. No skip links can be found that connect the cross-domain depth encoder with the synthetic decoder as well in those references.
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 

/XIAO LIU/Examiner, Art Unit 2664                                                                                                                                                                                             

/NANCY BITAR/Primary Examiner, Art Unit 2664