DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is responsive to the correspondence filled on 1/15/21.
Claims 1-15 are presented for examination.

IDS Considerations

The information disclosure statement (IDS) submitted on 7/27/20, 1/13/20 and 11/12/19 is/are being considered by the examiner as the submission is in compliance with the provisions of 37 CFR 1.97.

Response to Arguments

Applicant's arguments filed 1/15/21 with respect to claims 1-15 have been considered but are not persuasive.

	Applicant argued in page 7-8 that prior art do not teach wherein the upsampling target information is determined based on information input from a user by a server and is transmitted to the electronic device.	

Examiner disagree on this because Theis col 13 line 64-66 when initially configuring a machine learning system, particularly when using a supervised machine 
	
	

Claim Rejections - 35 USC § 103

The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.


Claims 1-4, 8 and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Theis (U.S. Pub. No. 10623775 B1), in view of Wang (U.S. Pub. No. 20170345130 A1).

Regarding to claim 1, 8 and 11:

1. Theis teach a method comprising: (Theis Fig. 3-4 col. 19 line 37-38 transmission or display devices) 
receiving a bitstream (Theis Fig. 1, 10 – compressed video is bit stream, col. 1 line 30-35 it is possible to use an empirically-derived formula to show how the bitrate of a video encoded with, for example the H.264 compression technique, relates to the resolution of that video) generated by encoding a first image; (Theis col. 4 line 20-30 FIG. 1 illustrates a block diagram of a compressive auto-encoder (CAE) system according to at least one example embodiment. As shown in FIG. 1, the CAE system 100 includes an encoder 110, a decoder 120 and a model 130. 
decoding the bitstream to obtain a second image; and (Theis Fig. 7B shows decoder 120, col. 17 line 20-25 using header information decoded from the compressed video data 10 video decoder system 750 can use prediction block 755 to create the same prediction block as was created in the video encoder system 700. The prediction block 755 can be added to the derivative residual to create reconstructed video data by the reconstruction block 760. Decoded video data 15 from Fig. 7B is second image)
obtaining a third image upsampled from the second image by using a first deep neural network (DNN) (Theis col. 3 line 38-40 FIG. 5A illustrates layers in a convolutional neural network with no sparsity constraints. Multiple layer makes neural network as deep neural network. Fig. 3B decoder side neural network is first neural network) for upsampling, based on upsampling target information, (Theis FIG. 3B col. 9 line 1-8 the video data is then convolved, upsampled again [third image] using sub-pixel convolution layers in order to upsample the image to the resolution of the original [target information] input video (e.g., video data 5))
wherein the upsampling target information is determined based on information input from a user by a server (Theis col 13 line 64-66 when initially configuring a machine learning system, particularly when using a supervised machine learning approach. Theis col 14 line 8-19 the user must however take care to ensure that the training data contains enough information to accurately predict desired output values without providing too many features (which can result in too many dimensions being considered by the machine learning process during training, and could also mean that 

Theis do not explicitly teach the method performed by an electronic device for displaying an image and providing, on a display of the electronic device, the third image wherein the first image is generated by downsampling an original image by using a DNN downsampling. 

However Wang teach the method performed by an electronic device for displaying an image, (Wang [0164] In some embodiments, by transmitting a section of lower-quality visual data over a network together with an example based model to aid reconstruction of high-quality visual data, less data can be transferred over the network to enable a higher-quality visual data to be displayed when compared to transmitting higher-quality visual data alone) and providing, on a display of the electronic device, the third image, (Wang [0164] In some embodiments, by transmitting a section of lower-quality visual data over a network together with an example based model to aid reconstruction of high-quality visual data, less data can be transferred over the network to enable a higher-quality visual data to be displayed when compared to transmitting higher-quality visual data alone)
wherein the first image is generated by downsampling an original image (Wang [0090] optionally, the one or more sections of lower-quality visual data are generated from the one or more sections of higher-quality visual data. Furthermore, optionally the one or more sections of lower-quality visual data may be generated from the high-quality visual data using a process comprising down-sampling) by using a second DNN downsampling. (Wang [0173] optionally, the example based model comprises any of: a generative model; a non-linear hierarchical algorithm; or a convolutional neural network; or a recurrent neural network; or a deep belief network; or a dictionary learning algorithm; or a parameter; or a mapping function. [0331] in some embodiments, optionally for use for a section of visual data, the example based model may be a neural network and can use spatio-temporal convolution. In some embodiments, separating visual data into a series of sections allows for the individual sections to be down-sampled thus reducing the visual data size, thereby allowing for lower quality sections to be transmitted as re-encoded visual data in the original or optionally a more optimal codec but at a lower quality)
and is transmitted to the electronic device. (Wang teach user input through server as well, because [0445] when bandwidth is particularly limited, the processing power of an end user terminal is low, or a user is willing to endure a lower quality section of video, the patch size may be increased to allow for video transmission under one or more of those circumstances. The quality of the reconstruction will be compromised but the processing power required at the end-user terminal will be significantly reduced. The limit of the number of sections of video processed in parallel is dependent on the computational complexity of the sections of video and the processing power of the 

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Theis, further incorporating Wang in video/camera technology. One would be motivated to do so, to incorporate the first image is generated by downsampling an original image by using a second DNN downsampling. This will provide coding efficiency.

Regarding to claim 2:

2. Theis teach the method of claim 1, Theis do not explicitly teach wherein the upsampling target information indicates a conversion degree of resolution of the first image.

However Wang teach wherein the upsampling target information indicates a conversion degree of resolution of the first image. (Wang [0423] the non-linearity function Ø applies element-wise and is fixed. The last layer f.sup.l has to convert the low resolution feature maps to a high resolution image I.sup.SR [upsampling target information]. Wang [0212] resampling into the higher dimension space of visual data from a low-quality to high-quality domain happens before being processed through a 

Regarding to claim 3:

3. Theis teach the method of claim 1, Theis do not explicitly teach wherein the upsampling target information is determined based on performance information about a display compression history information, or a type of the original image.

However Wang teach wherein the upsampling target information (Wang [0399] step 120 (or step 210), the portion of the low-resolution video and the reconstruction model for that portion of video are output for transmission. In some embodiments, at step 120 (or step 210), the low-resolution video frames can be re-encoded using either the original video codec applied to the original video data 70 [target information]) is determined based on performance information about a display compression history information, or a type of the original image. (Wang [0383] original video data 70 is a high-resolution video, for example having a resolution of 1920 pixels by 1080 pixels (also known as “1080p” video) or 3840 pixels by 2160 pixels (also known as “4K” video). This video data can be encoded in a variety of known video codecs, such as H.264 or VP8, but can be any video data that can 

Regarding to claim 4:

4. Theis teach the method of claim 1, wherein the first DNN (Theis col. 3 line 38-40 FIG. 5A illustrates layers in a convolutional neural network with no sparsity constraints. Multiple layer makes neural network as deep neural network. Fig. 3B decoder side neural network is first neural network) is trained based on lossy information (Theis col. 2 line 48-52 generate first compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using at least one convolution having a model trained using a neural network)
obtained by upsampling (Theis FIG. 3B col. 9 line 1-8 the video data is then convolved, upsampled again [third image] using sub-pixel convolution layers in order to upsample the image to the resolution of the original [target information] input video (e.g., video data 5)) a downsampled image (Theis FIG. 3A col. 7 line 65-67, col. 8 line 1-3 The mirror pad 304 can be configured to pad or add pixels to the input image at the boundary of the image using pixels adjacent to the boundary of the image. For example, if the output of the encoder 110 is to have a same spatial extent as an n times downsampled image)

Theis do not explicitly teach that is downsampled  by the second DNN from an original image for training.

However Wang teach that is downsampled (Wang [0090] optionally, the one or more sections of lower-quality visual data are generated from the one or more sections of higher-quality visual data. Furthermore, optionally the one or more sections of lower-quality visual data may be generated from the highquality visual data using a process comprising down-sampling) by the second DNN from an original image for training. (Wang [0173] the example based model comprises any of: a generative model; a non-linear hierarchical algorithm; or a convolutional neural network; or a recurrent neural network; or a deep belief network; or a dictionary learning algorithm [training]; or a parameter; or a mapping function. [0331] in some embodiments, optionally for use for a section of visual data, the example based model may be a neural network and can use spatio-temporal convolution. In some embodiments, separating visual data into a series of sections allows for the individual sections to be downsampled thus reducing the visual data size, thereby allowing for lower quality sections to be transmitted as re-encoded visual data in the original or optionally a more optimal codec but at a lower quality)

Claims 5-7 and 9-10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Theis (U.S. Pub. No. 10623775 B1), in view of Wang (U.S. Pub. No. 20170345130 A1), further in view of Rippel (U.S. Pub. No. 20180176576 A1).

Regarding to claim 5:

5. Theis teach the method of claim 4, Theis do not explicitly teach wherein the lossy information (Theis col. 2 line 48-52 generate first compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using at least one convolution having a model trained using a neural network) obtained by upsampling (Theis FIG. 3B col. 9 line 1-8 the video data is then convolved, upsampled again [third image] using sub-pixel convolution layers in order to upsample the image to the resolution of the original [target information] input video (e.g., video data 5)) the downsampled image (Theis FIG. 3A col. 7 line 65-67, col. 8 line 1-3 The mirror pad 304 can be configured to pad or add pixels to the input image at the boundary of the image using pixels adjacent to the boundary of the image. For example, if the output of the encoder 110 is to have a same spatial extent as an n times downsampled image) 

Theis do not explicitly teach comprises lossy information obtained based on a result of comparing a reconstructed image output from the first DNN with the original image for training on which the downsampling is not performed, and the lossy information obtained by upsampling the downsampled image is used in training the second DNN.

However Wang teach and the lossy information obtained by upsampling the downsampled image is used in training the second DNN. (Wang [0326] Some aspects can provide an improved technique for generating reconstruction parameters that can be used, when converting an original high-quality video into a down-sampled low-quality video, to allow recreation of a higher-quality version of the video from down-

The motivation for combining Theis and Wang as set forth in claim 1 is equally applicable to claim 5.

However Rippel teach comprises lossy information obtained based on a result of comparing a reconstructed image output from the first DNN with the original image for training on which the downsampling is not performed, (Rippel [0062] the SSIM is a measure of quality that compares the means and variances of the reconstruction and compares them to the original. The multi-scale variant of SSIM (MS-

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Theis, further incorporating Wang and Rippel in video/camera technology. One would be motivated to do so, to incorporate lossy information obtained based on a result of comparing a reconstructed image output from the first DNN with the original image for training on which the downsampling is not performed. This will improve image quality.

Regarding to claim 6:

6. Theis teach the image reconstructing method of claim 5, Theis do not explicitly teach wherein the second DNN is trained based on a difference between the downsampled image and a spatially decreased image from the original image for training.

However Rippel teach wherein the second DNN is trained based on a difference between the downsampled image (Rippel FIG. 3A [0040] this example depicts a decreasing dimensionality of the height and width by a factor of 2 after the application of a downsampling operator, the dimensionality may be reduced in other fashions (e.g., ⋅) is non-linear and is applied by a trained machine learning model that is trained during the training phase to identify the optimal downsampling operator for identifying structures in the input image 205) and a spatially decreased image from the original image for training. (Rippel FIG. 3A [0040] the input image 205 may have initial C×H×W dimensions of 3×1080×1920. Therefore, the feature extraction module 210 applies a downsampling operator D.sub.1(⋅) to downsample the input image 205 to generate a first downsampled image 310A with dimensions of 64×540×960. This can be further downsampled using downsampling operator D.sub.2(⋅) to a second downsampled image 310B with dimensions of 64×270×480)

Regarding to claim 7:

7. Theis teach the image reconstructing method of claim 6, Theis do not explicitly teach wherein the spatially decreased image has a structural characteristic of the original image for training, wherein the structural characteristic comprises at least one of luminance of the original image for training, contrast of the original image for training, a histogram of the original image for training, an encoding quality, compression history information, or a type of the original image for training.

wherein the spatially decreased image has a structural characteristic of the original image for training, (Rippel FIG. 3A [0040] the input image 205 may have initial C×H×W dimensions of 3×1080×1920. Therefore, the feature extraction [structural characteristic] module 210 applies a downsampling operator D.sub.1(⋅) to downsample the input image 205 to generate a first downsampled image 310A with dimensions of 64×540×960. This can be further downsampled using downsampling operator D.sub.2(⋅) to a second downsampled image 310B with dimensions of 64×270×480. this example depicts a decreasing dimensionality of the height and width by a factor of 2 after the application of a downsampling operator, the dimensionality may be reduced in other fashions (e.g., non-linearly) according to the downsampling operator. In various embodiments, the downsampler operator D.sub.m(⋅) is non-linear and is applied by a trained machine learning model that is trained during the training phase to identify the optimal downsampling operator for identifying structures in the input image 205)
wherein the structural characteristic comprises at least one of luminance of the original image for training, contrast of the original image for training, a histogram of the original image for training, an encoding quality, compression history information, or a type of the original image for training. (Rippel [0032] the discriminator module 180 uses generative adversarial network (GAN) approaches to improve the compression and reconstruction quality of input images. For example, the discriminator module 180 can train a model in parallel with the encoder module 140 
such that the encoder module 140 can more efficiently encode the input image with higher quality. [0037] The feature extraction module 210 trains and applies a ∈Z.sub.≥0, total 1×k∈Z.sub.≥0) from the binary code 405 with each of the K context features. In various embodiments, the feature probability 420 (e.g., 1×K∈(0,1]) is calculated as the fraction of times in the training data the bit associated with that feature had the value 1, possibly smoothed with a Laplace smoothing process)

Regarding to claim 9:

9. Theis teach the image compressing method of claim 8, wherein the second DNN is trained (Theis FIG. 3A) based on: first lossy information (Theis col. 2 line 48-52 generate first compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using at least one convolution having a model trained using a neural network) obtained based on a downsampled image that is downsampled from the original image for training by the second DNN (Theis FIG. 3A col. 7 line 65-67, col. 8 line 1-3 The mirror pad 304 can be configured to pad or add pixels to the input image at the boundary of the image using pixels adjacent to the boundary of the image. For example, if the output of the encoder 110 is to have a same spatial extent as an n times downsampled image) 
second lossy information corresponding to structural complexity of the downsampled image, and (Theis col. 1 line 42-53 Equation (1) illustrates the direct 
third lossy information obtained based on the original image for training (Theis col. 2 line 48-52 generate first compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using at least one convolution having a model trained using a neural network) and a reconstructed image that is obtained by upsampling (Theis FIG. 3B col. 9 line 1-8 the video data is then convolved, upsampled again [third image] using sub-pixel convolution layers in order to upsample the image to the resolution of the original [target information] input video (e.g., video data 5)) the downsampled image. (Theis FIG. 3A col. 7 line 65-67, col. 8 line 1-3 The mirror pad 304 can be configured to pad or add pixels to the input image at the boundary of the image using pixels adjacent to the boundary of the image. For example, if the output of the encoder 110 is to have a same spatial extent as an n times downsampled image)

Theis do not explicitly teach a spatially decreased image from the original image for training. 

However Rippel teach a spatially decreased image from the original image for training. (Rippel FIG. 3A [0040] the input image 205 may have initial C×H×W dimensions of 3×1080×1920. Therefore, the feature extraction [structural characteristic] module 210 applies a downsampling operator D.sub.1(⋅) to downsample the input image 205 to generate a first downsampled image 310A with dimensions of 64×540×960. This can be further downsampled using downsampling operator D.sub.2(⋅) to a second downsampled image 310B with dimensions of 64×270×480. this example depicts a decreasing dimensionality of the height and width by a factor of 2 after the application of a downsampling operator, the dimensionality may be reduced in other fashions (e.g., non-linearly) according to the downsampling operator. In various embodiments, the downsampler operator D.sub.m(⋅) is non-linear and is applied by a trained machine learning model that is trained during the training phase to identify the optimal downsampling operator for identifying structures in the input image 205)

Regarding to claim 10:

10. Theis teach the image compressing method of claim 9, Theis do not explicitly teach wherein the third lossy information is used in training the first DNN. 

However Rippel teach wherein the third lossy information is used in training the first DNN. (Rippel [0060] each of them is upsampled to the next scale using transformations D.sub.m′ and added together to obtain the reconstructed image 275. In ⋅), g.sub.m′(⋅), and g′(⋅) are set to be the inverse of the corresponding transformations in the feature extraction module 210, and in other embodiments they are trained independently. However, given that the process to generate the quantized tensor is a lossy operation, there is a loss in quality in the reconstructed input image 275. [0062] the SSIM is a measure of quality that compares the means and variances of the reconstruction and compares them to the original. The multi-scale variant of SSIM (MS-SSIM) performs that operation over multiple scales. In various embodiments, the trained model is a neural network [DNN] and the feedback is achieved via backpropagation using gradient descent. In the case of SSIM and MS-SSIM loss, the derivative of the loss is computed during the backpropagation step)

12. (Cancelled).

13. (Cancelled).

14. (Cancelled).

15. (Cancelled).
Conclusion



A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action. In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to NASIM N NIRJHAR whose telephone number is (571)272-3792.  The examiner can normally be reached on Monday - Friday, 8 am to 5 pm ET.

If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Christopher Kelley can be reached on (571)272-7331.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.



/NASIM N NIRJHAR/Primary Examiner, Art Unit 2482