DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 7 and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Lee (US 2021/0160522) in view of Xu (US 2021/0327033).
Regarding claim 1, Lee discloses a method of video coding at a video coding device (paragraph [61], fig.1 is an encoding apparatus for implementing a method of video coding), comprising:
performing convolutional deep neural network (paragraph [59], Lee discloses implementing machine learning algorithm for video image compression utilizing a deep neural network, and paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a convolutional neural network (CNN)) to generate one or more first feature maps (paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image) based on a set of one or more previously reconstructed reference frames (paragraph [62], Lee discloses a reconstructed picture buffer for storing the reconstructed reference frames, wherein paragraph [75], Lee discloses reconstructed data of the spatial domain is generated as a reconstructed image through the in-loop filter 150, and that in-loop filter 150 performs deblocking and sample adaptive offset filtering after deblocking to generate the reconstructed image and send the generated reconstructed image to the reconstructed picture buffer 125);
generating a predicted frame based on the one or more first feature maps (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210, and wherein paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image); and
reconstructing a current frame based on the predicted frame (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, and that the reconstructed frames stored in reconstructed picture buffer 125 can be utilized as a reference picture for inter prediction of producing another image in a recyclical manner, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210).
	Lee does not disclose performing a deformable convolution through a deformable convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames.  However, Xu teaches performing a deformable convolution through a deformable convolutional deep neural network (paragraph [33], Xu discloses that deep neural network training is performed on a video sequence of images for obtaining a deformable convolutional kernel, thus obtaining deformable convolution deep neural network). Since Lee discloses “performing convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames”, and Xu discloses “performing a deformable convolution through a deformable convolutional deep neural network”, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee and Xu together as a whole for ascertaining the limitation “performing a deformable convolution through a deformable convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames” so as to reduce image blurs, image detail losses, ghosts for improving video image quality at the display terminal (Xu’s paragraph [118]).
Regarding claim 7, Lee discloses wherein the generating the predicted frame based on the one or more first feature maps (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210, and wherein paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image) includes:
reconstructed one or more aligned frames using a frame reconstructed DNN based on the one or more first feature maps and the one or more previously reconstructed reference frames (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210, and wherein paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image, and paragraph [105], Lee discloses that the frames are aligned in a series of still images or a GOP (group of pictures), wherein P (predictive) and B (bi-directional) pictures are pictures encoded by performing motion estimation and motion compensation, wherein a P picture can be obtained by utilizing a reference picture, and B pictures can be obtained utilizing reference pictures from the past picture and future picture); and
generating the predicted frame using a frame synthesis DNN based on the one or more aligned frames (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210, and wherein paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image, and paragraph [105], Lee discloses that the frames are aligned in a series of still images or a GOP (group of pictures), wherein P (predictive) and B (bi-directional) pictures are pictures encoded by performing motion estimation and motion compensation, wherein a P picture can be obtained by utilizing a reference picture, and B pictures can be obtained utilizing reference pictures from the past picture and future picture, thus synthesizing P and B pictures).
Regarding claim 17, Lee discloses a non-transitory computer-readable medium storing instructions that (paragraph [372], Lee discloses implementing a computer readable recording medium for storing instructions to be executed by a computer or processor), when executed by a processor (paragraph [128], Lee discloses a processor for executing instructions stored on a computer readable medium; paragraph [372], Lee discloses implementing a computer readable recording medium for storing instructions to be executed by a computer or processor), cause the processor to perform a method of video coding (paragraph [61], fig.1 is an encoding apparatus for implementing a method of video coding), the method comprising:
performing convolutional deep neural network (paragraph [59], Lee discloses implementing machine learning algorithm for video image compression utilizing a deep neural network, and paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a convolutional neural network (CNN)) to generate one or more first feature maps (paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image) based on a set of one or more previously reconstructed reference frames (paragraph [62], Lee discloses a reconstructed picture buffer for storing the reconstructed reference frames, wherein paragraph [75], Lee discloses reconstructed data of the spatial domain is generated as a reconstructed image through the in-loop filter 150, and that in-loop filter 150 performs deblocking and sample adaptive offset filtering after deblocking to generate the reconstructed image and send the generated reconstructed image to the reconstructed picture buffer 125); 
generating a predicted frame based on the one or more first feature maps (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210, and wherein paragraph [117], Lee discloses that the DNN learning model 520 can be configured in a form of a CNN, and paragraph [118], Lee discloses the CNN has a structure suitable for image analysis that includes feature extraction layers, wherein paragraph [119], Lee discloses that a feature extraction layer has structure which a convolution layer generates feature maps by applying plural filters to each region of an image for extracting features of an image); and 
reconstructing a current frame based on the predicted frame (paragraph [121], Lee discloses that new feature maps is generated by utilizing local information of a feature map obtained by a previous convolution layer, wherein paragraph [73], Lee discloses performing inter prediction with element 115 by utilizing data from input image 105 and a reference picture obtained by the reconstructed picture buffer for generating a prediction or a predicted frame, and paragraph [75], Lee discloses generating residual data from calculating a difference between data of input image and data from output of inter-predictor, and that the reconstructed frames stored in reconstructed picture buffer 125 can be utilized as a reference picture for inter prediction of producing another image in a recyclical manner, thus generating the reconstruction of a current frame based on the predicted frame, and wherein paragraph [219], Lee discloses inputting reconstructed pictures 1220 and 1230 to a DNN learning model 1240 for predicting a current picture 1210).
Lee does not disclose performing a deformable convolution through a deformable convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames.  However, Xu teaches performing a deformable convolution through a deformable convolutional deep neural network (paragraph [33], Xu discloses that deep neural network training is performed on a video sequence of images for obtaining a deformable convolutional kernel, thus obtaining deformable convolution deep neural network). Since Lee discloses “performing convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames”, and Xu discloses “performing a deformable convolution through a deformable convolutional deep neural network”, therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Lee and Xu together as a whole for ascertaining the limitation “performing a deformable convolution through a deformable convolutional deep neural network (DNN) to generate one or more first feature maps based on a set of one or more previously reconstructed reference frames” so as to reduce image blurs, image detail losses, ghosts for improving video image quality at the display terminal (Xu’s paragraph [118]).
Allowable Subject Matter
Claims 12-16 are allowed.
The following is a statement of reasons for the indication of allowable subject matter:  the present invention pertains to video compression based on deep neural networks.
With regards to claim 12, Lee (US 2021/0160522) discloses a method of neural network training (paragraph 117), comprising: 
inputting a set of reference frames to a predicted frame generation module to generate a predicted frame (paragraph 75), the predicted frame generation module including neural networks having parameters that are to be optimized (paragraph 219), the neural networks including convolutional deep neural network (paragraph 117).
Xu (US 2021/0327033) discloses performing a deformable convolution through a deformable convolutional deep neural network (paragraph 33).
The prior art, either singularly or in combination, does not disclose “a method of neural network training, comprising: inputting a set of reference frames to a predicted frame generation module to generate a predicted frame, the predicted frame generation module including neural networks having parameters that are to be optimized, the neural networks including a deformable convolutional deep neural network (DNN); determining a loss of a loss function, the loss function including: a compressive loss indicating a bitrate estimated based on a difference between the predicted frame and a ground-truth frame, and a reconstruction quality loss indicating a quality of the predicted frame with respect to the ground-troth frame; and performing a back-propagation based on the loss of the loss function to update the parameters of the neural networks in the predicted frame generation module” of claim 12.  
Thus, the prior art does not disclose the aforementioned claim 12. 

Claims 2-6, 8-11 and 18-20 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to ALLEN C WONG whose telephone number is (571)272-7341.  The examiner can normally be reached on Flex Monday-Thursday 9:30am-7:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Sath V Perungavoor can be reached on 571-272-7455.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

	
	
/ALLEN C WONG/Primary Examiner, Art Unit 2488