DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Applicant's amendments filed on 8 July 2022 have been entered.  Claims 1, 5, 6, 8, 14-20, 24, and 32-37 have been amended.  No claims have been canceled.  Claims 38 and 39 have been added.  Claims 1-39 are still pending in this application, with claims 1, 8, 14, 20, 26, and 32 being independent.

Response to Arguments
Applicant's arguments filed 8 July 2022 have been fully considered but they are not persuasive. Applicant argues, regarding claim 1, that “that the frame-estimation model in Vemulapalli takes a current low resolution frame and a previous high resolution frame as inputs to output a current estimated high resolution frame based on a mapping. However, in contrast to “generat[ing], using one or more neural networks, a higher resolution video using upsampled frames of a lower resolution video blended with previously-inferred frames of the higher resolution video,” recited in claim 1, the cited portions of Vemulapalli do not teach using a current upsampled (i.e., high resolution) frame as model input, nor any detail indicating that blending is performed”.
First, Examiner notes that nothing in claim 1 recites using a current upsampled frame as model input. Further, Examiner notes that Applicant solely relies on portions of Vemulapalli which were not cited for this particular limitation (aside from relation to cited Fig. 2). Regardless, Examiner maintains that the cited portions of Vemulapalli indeed read on the claimed limitations. For example, Paragraphs [0031]-[0035] recite: “the computing system can upsample the current low-resolution image frame to a high-resolution space of the warped previous estimated high-resolution image frame to map the warped previous estimated high-resolution image frame to the current low-resolution image-frame…some implementations, the computing system can input the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model…the current estimated high-resolution image frame can be passed back for use as an input in the next iteration” (emphasis added). Examiner notes that not only do the cited portions read on the currently claimed limitation, but that the above portion too reads on Applicant’s discussion of using a current upsampled frame as input. Therefore Examiner maintains the rejections for at least the above reasons. 
For the remaining claims, Applicant argues for their allowance for the same reasons as above and for dependence to one of the independent claims. It follows that the remaining claims remain rejected for the same reasons.

Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claims 1-4, 8-11, 14-17, 20-23, 26, 28, 29, 31-35, and 37 are rejected under 35 U.S.C. 102(a)(1)/(a)(2) as being anticipated by Vemulapalli et al. (US Pub. 2019/0206026), hereinafter Vemulapalli. 
Regarding claim 1, Vemulapalli discloses a processor (Paragraph [0006]: computing system can include at least one processor) comprising: one or more circuits to be configured to generate, using one or more neural networks, a higher resolution video using upsampled frames of a lower resolution video blended with previously-inferred frames of the higher resolution video (Fig. 2; Fig. 6; Paragraphs [0031]-[0035]: the computing system can upsample the current low-resolution image frame to a high-resolution space of the warped previous estimated high-resolution image frame to map the warped previous estimated high-resolution image frame to the current low-resolution image-frame…some implementations, the computing system can input the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model…the current estimated high-resolution image frame can be passed back for use as an input in the next iteration. That is, the current estimated high-resolution image frame can be used as the previous estimated high-resolution image frame at the next iteration, in which a next subsequent low resolution image frame is super-resolved…the machine-learned recurrent super-resolution model can include one or more neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models. Example neural networks can include feed-forward neural networks, convolutional neural networks, recurrent neural networks (e.g., long short-term memory (LSTM) recurrent neural networks, gated recurrent unit (GRU) neural networks), or other forms of neural networks; Paragraph [0044]: use of a recurrent super-resolution model in the systems and methods of the present disclosure provide a number of technical advantages. For example, by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraph [0067]: model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory, and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media).
Regarding claim 2, Vemulapalli discloses the processor of claim 1, wherein the one or more neural networks are trained using pairs of lower and higher resolution video frames (Fig. 2; Paragraph [0023]: a machine-learned recurrent super-resolution model to super-resolve imagery such as image frames of a video. In particular, the recurrent super-resolution model can be structured according to an end-to-end trainable frame-recurrent video super-resolution framework that uses a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame; Paragraph [0077]: FIG. 2 depicts a block diagram of an example recurrent super-resolution model 200 according to example embodiments of the present disclosure. In some implementations, the recurrent super-resolution model 200 is trained to receive input data that includes a current low-resolution image frame 204 and a previous estimated high-resolution image frame 206; and, as a result of receipt of the current low-resolution image frame 204 and previous estimated high-resolution image frame 206, provide a current estimated high-resolution image frame 208).
Regarding claim 3, Vemulapalli discloses the processor of claim 1, wherein the one or more neural networks are trained using a complex loss function comprising at least a style loss term and a temporal loss term (Fig. 4A; Fig. 4B; Paragraph [0039]: train the recurrent super-resolution model, the training computing system can input a first portion of a set of ground-truth data (e.g., the first portion of the training dataset corresponding to a set of low-resolution image frames) into the recurrent super-resolution model. In response to receipt of such first portion, the recurrent super-resolution model can output an estimated high-resolution image frame for each input low-resolution image frame. This output of the machine-learned model predicts the remainder of the set of ground-truth data (e.g., the second portion of the training dataset corresponding to a set of original high-resolution image frames). After such prediction, the training computing system can apply or otherwise determine a loss function that compares the estimated high-resolution image frames output by the recurrent super-resolution model to the remainder of the ground-truth data which the model attempted to predict. The training computing system then can backpropagate the loss function through the recurrent super-resolution model to train the model (e.g., by modifying one or more weights associated with the model). This process of inputting ground-truth data, determining a loss function and backpropagating the loss function through the recurrent super-resolution model can be repeated numerous times as part of training the recurrent super-resolution model. For example, the process can be repeated for each of numerous sets of ground-truth data provided within the training dataset; Paragraph [0044]: by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraphs [0091]-[0092]: FIG. 4A depicts a block diagram 402 of an example loss function used for training a machine-learned flow estimation model according to example embodiments of the present disclosure. A computing system (e.g., training computing system 150) can train the flow estimation model by applying a loss 410 on a warped previous low-resolution image frame, with respect to a current low-resolution image frame 412. The warped previous low-resolution image frame can be obtained by warping a previous low-resolution image frame 414 based on an estimated flow-map… FIG. 4B depicts a block diagram 404 of an example loss function used for training a machine-learned frame estimation model according to example embodiments of the present disclosure. A computing system (e.g., training computing system 150) can train the frame estimation model by applying a loss 420 on a current estimated high-resolution image frame 424 output by the frame estimation model, with respect to a current ground-truth high-resolution image frame 422. The loss 420 can be backpropagated through the frame estimation model to train the model).
Regarding claim 4, Vemulapalli discloses the processor of claim 1, wherein the one or more neural networks are trained to determine a blending factor and at least one kernel factor for blending pixel values of the upsampled frames and previously-inferred frames (Fig. 7; Fig. 8; Paragraph [0084]: FIG. 7 provides one example network architecture that can be used to implement the machine-learned flow estimation model 310. Other model architectures can be used instead of the example shown in FIG. 7. The example network architecture shown in FIG. 7 can be fully convolutional and can work in LR space. In some implementations, all convolutions can use 3×3 kernels with stride 1. The notation 2× indicates that the corresponding block is duplicated and the leaky ReLU units can use a leakage factor of 0.2; Paragraph [0090]: FIG. 8 provides one example network architecture that can be used to implement the machine-learned frame estimation model 320. Other model architectures can be used instead of the example shown in FIG. 8. The example network architecture shown in FIG. 8 can be fully convolutional and can work in LR space. In some implementations, all convolutions can use 3×3 kernels with stride 1 except the transposed convolutions which can use stride 2).
Regarding claim 8, the limitations of this claim substantially correspond to the limitations of claim 1; thus they are rejected on similar grounds.
Regarding claim 9, the limitations of this claim substantially correspond to the limitations of claim 2; thus they are rejected on similar grounds.
Regarding claim 10, the limitations of this claim substantially correspond to the limitations of claim 3; thus they are rejected on similar grounds.
Regarding claim 11, the limitations of this claim substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds. 
Regarding claim 14, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the machine-readable medium, which is disclosed by Vemulapalli, Fig. 1A; Paragraph [0057]: server computing system 130 includes one or more processors 132 and a memory 134. The one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 134 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof); thus they are rejected on similar grounds.
Regarding claim 15, the limitations of this claim substantially correspond to the limitations of claim 2; thus they are rejected on similar grounds.
Regarding claim 16, the limitations of this claim substantially correspond to the limitations of claim 3; thus they are rejected on similar grounds.
Regarding claim 17, the limitations of this claim substantially correspond to the limitations of claim 4; thus they are rejected on similar grounds.
Regarding claim 20, Vemulapalli discloses a processor (Paragraph [0006]: computing system can include at least one processor) comprising: one or more circuits to train one or more neural networks, at least in part, to generate a higher resolution video using upsampled frames of a lower resolution video blended with prior higher resolution frames of the higher resolution video (Fig. 2; Fig. 6; Paragraphs [0031]-[0035]: the computing system can upsample the current low-resolution image frame to a high-resolution space of the warped previous estimated high-resolution image frame to map the warped previous estimated high-resolution image frame to the current low-resolution image-frame…some implementations, the computing system can input the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model…the current estimated high-resolution image frame can be passed back for use as an input in the next iteration. That is, the current estimated high-resolution image frame can be used as the previous estimated high-resolution image frame at the next iteration, in which a next subsequent low resolution image frame is super-resolved…the machine-learned recurrent super-resolution model can include one or more neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models. Example neural networks can include feed-forward neural networks, convolutional neural networks, recurrent neural networks (e.g., long short-term memory (LSTM) recurrent neural networks, gated recurrent unit (GRU) neural networks), or other forms of neural networks; Paragraph [0044]: use of a recurrent super-resolution model in the systems and methods of the present disclosure provide a number of technical advantages. For example, by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraph [0067]: model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory, and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media).
Regarding claim 21, Vemulapalli discloses the processor of claim 20, wherein the one or more neural networks are trained using pairs of lower and higher resolution video frames (Fig. 2; Paragraph [0023]: a machine-learned recurrent super-resolution model to super-resolve imagery such as image frames of a video. In particular, the recurrent super-resolution model can be structured according to an end-to-end trainable frame-recurrent video super-resolution framework that uses a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame; Paragraph [0077]: FIG. 2 depicts a block diagram of an example recurrent super-resolution model 200 according to example embodiments of the present disclosure. In some implementations, the recurrent super-resolution model 200 is trained to receive input data that includes a current low-resolution image frame 204 and a previous estimated high-resolution image frame 206; and, as a result of receipt of the current low-resolution image frame 204 and previous estimated high-resolution image frame 206, provide a current estimated high-resolution image frame 208).
Regarding claim 22, Vemulapalli discloses the processor of claim 21, wherein the one or more neural networks are trained using a complex loss function comprising at least a style loss term and a temporal loss term (Fig. 4A; Fig. 4B; Paragraph [0039]: train the recurrent super-resolution model, the training computing system can input a first portion of a set of ground-truth data (e.g., the first portion of the training dataset corresponding to a set of low-resolution image frames) into the recurrent super-resolution model. In response to receipt of such first portion, the recurrent super-resolution model can output an estimated high-resolution image frame for each input low-resolution image frame. This output of the machine-learned model predicts the remainder of the set of ground-truth data (e.g., the second portion of the training dataset corresponding to a set of original high-resolution image frames). After such prediction, the training computing system can apply or otherwise determine a loss function that compares the estimated high-resolution image frames output by the recurrent super-resolution model to the remainder of the ground-truth data which the model attempted to predict. The training computing system then can backpropagate the loss function through the recurrent super-resolution model to train the model (e.g., by modifying one or more weights associated with the model). This process of inputting ground-truth data, determining a loss function and backpropagating the loss function through the recurrent super-resolution model can be repeated numerous times as part of training the recurrent super-resolution model. For example, the process can be repeated for each of numerous sets of ground-truth data provided within the training dataset; Paragraph [0044]: by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraphs [0091]-[0092]: FIG. 4A depicts a block diagram 402 of an example loss function used for training a machine-learned flow estimation model according to example embodiments of the present disclosure. A computing system (e.g., training computing system 150) can train the flow estimation model by applying a loss 410 on a warped previous low-resolution image frame, with respect to a current low-resolution image frame 412. The warped previous low-resolution image frame can be obtained by warping a previous low-resolution image frame 414 based on an estimated flow-map… FIG. 4B depicts a block diagram 404 of an example loss function used for training a machine-learned frame estimation model according to example embodiments of the present disclosure. A computing system (e.g., training computing system 150) can train the frame estimation model by applying a loss 420 on a current estimated high-resolution image frame 424 output by the frame estimation model, with respect to a current ground-truth high-resolution image frame 422. The loss 420 can be backpropagated through the frame estimation model to train the model).
Regarding claim 23, Vemulapalli discloses the processor of claim 21, wherein the one or more neural networks are trained to determine a blending factor and at least one kernel factor for blending pixel values of the upsampled frames and previously-inferred frames (Fig. 7; Fig. 8; Paragraph [0084]: FIG. 7 provides one example network architecture that can be used to implement the machine-learned flow estimation model 310. Other model architectures can be used instead of the example shown in FIG. 7. The example network architecture shown in FIG. 7 can be fully convolutional and can work in LR space. In some implementations, all convolutions can use 3×3 kernels with stride 1. The notation 2× indicates that the corresponding block is duplicated and the leaky ReLU units can use a leakage factor of 0.2; Paragraph [0090]: FIG. 8 provides one example network architecture that can be used to implement the machine-learned frame estimation model 320. Other model architectures can be used instead of the example shown in FIG. 8. The example network architecture shown in FIG. 8 can be fully convolutional and can work in LR space. In some implementations, all convolutions can use 3×3 kernels with stride 1 except the transposed convolutions which can use stride 2). 
Regarding claim 26, the limitations of this claim substantially correspond to the limitations of claim 20 (except for the one or more memories, which are disclosed by Vemulapalli, Fig. 1A; Paragraph [0057]: server computing system 130 includes one or more processors 132 and a memory 134. The one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 134 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof); thus they are rejected on similar grounds.
Regarding claim 28, the limitations of this claim substantially correspond to the limitations of claim 22; thus they are rejected on similar grounds. 
Regarding claim 29, the limitations of this claim substantially correspond to the limitations of claim 23; thus they are rejected on similar grounds.
Regarding claim 31, the limitations of this claim substantially correspond to the limitations of claim 25; thus they are rejected on similar grounds.
Regarding claim 32, the limitations of this claim substantially correspond to the limitations of claim 1 (except for the machine-readable medium, which is disclosed by Vemulapalli, Fig. 1A; Paragraph [0057]: server computing system 130 includes one or more processors 132 and a memory 134. The one or more processors 132 can be any suitable processing device (e.g., a processor core, a microprocessor, an ASIC, a FPGA, a controller, a microcontroller, etc.) and can be one processor or a plurality of processors that are operatively connected. The memory 134 can include one or more non-transitory computer-readable storage mediums, such as RAM, ROM, EEPROM, EPROM, flash memory devices, magnetic disks, etc., and combinations thereof); thus they are rejected on similar grounds.
Regarding claim 33, the limitations of this claim substantially correspond to the limitations of claim 21; thus they are rejected on similar grounds.
Regarding claim 34, the limitations of this claim substantially correspond to the limitations of claim 22; thus they are rejected on similar grounds.
Regarding claim 35, the limitations of this claim substantially correspond to the limitations of claim 23; thus they are rejected on similar grounds.
Regarding claim 37, the limitations of this claim substantially correspond to the limitations of claim 25; thus they are rejected on similar grounds.
Regarding claim 38, Vemulapalli discloses the processor of claim 1, wherein the wherein the one or more circuits are to generate the higher resolution video using data from the previously-inferred frames (Fig. 2; Fig. 6; Paragraphs [0031]-[0035]: the computing system can upsample the current low-resolution image frame to a high-resolution space of the warped previous estimated high-resolution image frame to map the warped previous estimated high-resolution image frame to the current low-resolution image-frame…some implementations, the computing system can input the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model…the current estimated high-resolution image frame can be passed back for use as an input in the next iteration. That is, the current estimated high-resolution image frame can be used as the previous estimated high-resolution image frame at the next iteration, in which a next subsequent low resolution image frame is super-resolved…the machine-learned recurrent super-resolution model can include one or more neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models. Example neural networks can include feed-forward neural networks, convolutional neural networks, recurrent neural networks (e.g., long short-term memory (LSTM) recurrent neural networks, gated recurrent unit (GRU) neural networks), or other forms of neural networks; Paragraph [0044]: use of a recurrent super-resolution model in the systems and methods of the present disclosure provide a number of technical advantages. For example, by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraph [0067]: model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory, and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media).
Regarding claim 39, Vemulapalli discloses the processor of claim 1, wherein the wherein the upsampled frames of a lower resolution video are blended with previously-inferred frames of the higher resolution video by blending pixel values of an upsampled frame with pixel values of a previously-inferred frame (Fig. 2; Fig. 6; Paragraphs [0031]-[0035]: the computing system can upsample the current low-resolution image frame to a high-resolution space of the warped previous estimated high-resolution image frame to map the warped previous estimated high-resolution image frame to the current low-resolution image-frame…some implementations, the computing system can input the warped previous estimated high-resolution image frame and the current low-resolution image frame into a machine-learned frame estimation model…the current estimated high-resolution image frame can be passed back for use as an input in the next iteration. That is, the current estimated high-resolution image frame can be used as the previous estimated high-resolution image frame at the next iteration, in which a next subsequent low resolution image frame is super-resolved…the machine-learned recurrent super-resolution model can include one or more neural networks (e.g., deep neural networks), support vector machines, decision trees, ensemble models, k-nearest neighbors models, Bayesian networks, or other types of models including linear models and/or non-linear models. Example neural networks can include feed-forward neural networks, convolutional neural networks, recurrent neural networks (e.g., long short-term memory (LSTM) recurrent neural networks, gated recurrent unit (GRU) neural networks), or other forms of neural networks; Paragraph [0044]: use of a recurrent super-resolution model in the systems and methods of the present disclosure provide a number of technical advantages. For example, by using a previously inferred high-resolution (HR) estimate to super-resolve the subsequent low-resolution (LR) frame, this framework can naturally encourage temporally consistent results. That is, by iteratively providing past inferences as inputs for the next iteration, data can be retained across iterations, thereby providing more temporally consistent outputs. Thus, by propagating information from past image frames to later image frames via an estimated high-resolution image frame that is recurrently passed through time, the systems and methods can recreate fine details and produce temporally consistent results without increasing computations; Paragraph [0067]: model trainer 160 includes computer logic utilized to provide desired functionality. The model trainer 160 can be implemented in hardware, firmware, and/or software controlling a general purpose processor. For example, in some implementations, the model trainer 160 includes program files stored on a storage device, loaded into a memory, and executed by one or more processors. In other implementations, the model trainer 160 includes one or more sets of computer-executable instructions that are stored in a tangible computer-readable storage medium such as RAM hard disk or optical or magnetic media).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5-7, 12, 13, 18, 19, 24, 25, 27, 30, and 36 are rejected under 35 U.S.C. 103 as being unpatentable over Vemulapalli, in view of Wang et al. (US Pub. 2020/0327702), hereinafter Wang.
Regarding claim 5, Vemulapalli discloses the processor of claim 1.
	Vemulapalli does not explicitly disclose wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space.
	However, Wang teaches enhanced resolution of a video using a neural network to upsample frames (Paragraphs [0022]-[0026]), further comprising wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space (Fig. 1; Paragraphs [0027]-[0028]: Input video bitstream 111 may be any suitable bitstream such as a standards compliant bitstream. For example, bitstream 115 may be any of Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), VP9, or AV1 compliant. Video decoder 101 receives input video bitstream 111 and decodes input video bitstream 111 (by implementing the corresponding codec) to generate decoded video as video frames 113. Video frames 113 may include any suitable video frames, video pictures, sequence of video frames, group of pictures, groups of pictures, video data, or the like in any suitable resolution. For example, the video may be video graphics array (VGA), high definition (HD), Full-HD (e.g., 1080p), 4K resolution video, or the like, and the video may include any number of video frames, sequences of video frames, pictures, groups of pictures, or the like. Techniques discussed herein are discussed with respect to video frames and blocks thereof for the sake of clarity of presentation. However, such video frames may be characterized as pictures, etc. and the blocks may be characterized as coding units, largest coding units, macroblocks, coding blocks, etc. For example, a picture or frame of color video data may include a luminance plane or component and two chrominance planes or components at the same or different resolutions with respect to the luminance plane. Video frames 113 (during encode) may be divided into blocks of any size, which contain data corresponding to blocks of pixels. Such blocks may include data from one or more planes or color channels of pixel data…video decoder 101, provides metadata including quantization parameters 112 (QPs) and motion vectors and coding modes 114. Coding modes 114 may include frame level coding information (e.g., frame type including I-frame, P-frame, etc.) and block level coding information including the coding mode of a block and, if applicable, a motion vector and residual information of a block for use as discussed herein. For example, the metadata may include all such information or only information pertinent to selective deep learning network application as discussed herein. For example, in contrast to a typical video decoder that only outputs video frames 113, video decoder 101 generates the metadata via a request of targeted information, via a request of a metadata dump of all such information, or the like). Wang is in the same field of endeavor as Vemulapalli (enhanced resolution of a video using a neural network to upsample frames) and teaches that this will provide deep learning based super-resolution for video that improves super-resolution quality and/or provides acceleration, reduced computational complexity, and memory cost (Paragraph [0004]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vemulapalli with the features wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space as taught by Wang so as to improve quality and reduce computational complexity as presented by Wang.
Regarding claim 6, Vemulapalli discloses the processor of claim 1.
	Vemulapalli does not explicitly disclose wherein the previously-inferred frames are bicubically motion warped, and wherein the one or more Circuits are to be configured to temporally anti-alias the frames of the upsampled lower resolution video being upsampled.
	However, Wang teaches enhanced resolution of a video using a neural network to upsample frames (Paragraphs [0022]-[0026]), further comprising wherein the previously-inferred frames are bicubically motion warped, and wherein the one or more Circuits are to be configured to temporally anti-alias the frames of the upsampled lower resolution video being upsampled (Fig. 1; Fig. 6; Paragraph [0025]: FIG. 1 is an illustrative diagram of an example system 100 for processing via selective application of a deep learning network, arranged in accordance with at least some implementations of the present disclosure. As shown in FIG. 1, system 100 includes a video decoder 101, a QP comparison module 102 (labeled QP<TH?), residual values comparison module 103 (labeled Residuals=0?), a hardware upsampler 104, a pixel value transfer module 105, a deep learning super-resolution network 106 (labeled AI Based Super Resolution), and a frame merge module 107. For example, system 100 receives an input video bitstream 111 (representative of video at a lower resolution) for super-resolution processing and system 100 provides upsampling to output video 118 at a higher resolution. Although illustrated with respect to super-resolution processing, system 100 may be employed in any suitable video deep learning applications such as anti-aliasing, noise reduction, or the like as discussed further herein below with respect to FIG. 6; Paragraph [0032]: Such interpolation based upsampling may include any suitable technique or techniques such as bilinear interpolation, bicubic interpolation, Lanczos interpolation, or others. It is noted that such techniques may be differentiated with respect to upsampling provided by deep learning super-resolution network 106 in that no deep learning layers, convolutional layers, connected layers, etc. are applied. Although, super-resolution blocks 115 are illustrated as being output from hardware upsampler 104 to frame merge module 107, in some embodiments, hardware upsampler 104 outputs full super-resolution frames that need not be merged with other blocks. Notably, hardware upsampler 104 may be applied on a frame-wise basis based on frame level QP with the interpolation based upsampling being performed regardless of coding mode of the blocks of the frame; Paragraph [0058]: the video frames to be decoded are to be enhanced in some manner such as super-resolution processing (e.g., upsampling), anti-aliasing (e.g., to smooth edges), noise reduction (e.g., to remove undesirable noise), or other image processing. The techniques discussed with respect to process 600 adaptively apply such processing by applying a deep learning network to decoded video frame samples, by retrieving prior processed samples, or by applying a non-deep learning network based approach (optionally in hardware) to the decoded video frame samples. In any such case, the output block for a processed block is provided for merge into output frame. Such anti-aliasing, noise reduction, or other video frame enhancement processing is performed in analogy with the techniques discussed herein above with respect to super-resolution processing). Wang is in the same field of endeavor as Vemulapalli (enhanced resolution of a video using a neural network to upsample frames) and teaches that this will provide deep learning based super-resolution for video that improves super-resolution quality and/or provides acceleration, reduced computational complexity, and memory cost (Paragraph [0004]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vemulapalli with the features wherein the previously-inferred frames are bicubically motion warped, and wherein the one or more Circuits are to be configured to temporally anti-alias the frames of the upsampled lower resolution video being upsampled as taught by Wang so as to improve quality and reduce computational complexity as presented by Wang.
Regarding claim 7, Vemulapalli discloses the processor of claim 1.
	Vemulapalli does not explicitly disclose wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine.
	However, Wang teaches enhanced resolution of a video using a neural network to upsample frames (Paragraphs [0022]-[0026]), further comprising wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine (Fig. 1; Paragraph [0026]: System 100 may be implemented via any suitable device such as, for example, a personal computer, a laptop computer, a tablet, a phablet, a smart phone, a digital camera, a gaming console, a wearable device, an all-in-one device, a two-in-one device, or the like or a platform such as a mobile platform or the like; Paragraphs [0037]-[0038]: frame merge module 107 receives super-resolution blocks 116, super-resolution blocks 117, and, if applicable, to super-resolution blocks 115 to generate a frame of output video 118 by merging the super-resolution blocks into a full super-resolution frame of output video 118. In some embodiments, frame merge module 107 also applies filtering such as de-block filtering or other image enhancement techniques. As discussed, output video 118 may be employed in a variety of contexts such as display to a user or use by other applications. Output video 118 may be sent to a display of system 100, to a memory of system 100, to another device via a communications link, etc…techniques discussed herein provide a variety of advantages including low overhead (as all information used to control super-resolution processing is required by the video decoder), acceleration of super-resolution processing using characteristics of the video content (with acceleration depending on content type: with no skip blocks, performance is about the same as application of a network with selectivity, with gaming and screen content performance increase of 45%-90% may be achieved), no conflict with other performance enhancements made via the network, and support for a wide range of codecs). Wang is in the same field of endeavor as Vemulapalli (enhanced resolution of a video using a neural network to upsample frames) and teaches that this will provide deep learning based super-resolution for video that improves super-resolution quality and/or provides acceleration, reduced computational complexity, and memory cost (Paragraph [0004]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vemulapalli with the features wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine as taught by Wang so as to improve quality and reduce computational complexity as presented by Wang.
Regarding claim 12, the limitations of this claim substantially correspond to the limitations of claim 5; thus they are rejected on similar grounds.
Regarding claim 13, the limitations of this claim substantially correspond to the limitations of claim 7; thus they are rejected on similar grounds.
Regarding claim 18, the limitations of this claim substantially correspond to the limitations of claim 5; thus they are rejected on similar grounds.
Regarding claim 19, the limitations of this claim substantially correspond to the limitations of claim 7; thus they are rejected on similar grounds.
Regarding claim 24, Vemulapalli discloses the processor of claim 21.
	Vemulapalli does not explicitly disclose wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space.
	However, Wang teaches enhanced resolution of a video using a neural network to upsample frames (Paragraphs [0022]-[0026]), further comprising wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space (Fig. 1; Paragraphs [0027]-[0028]: Input video bitstream 111 may be any suitable bitstream such as a standards compliant bitstream. For example, bitstream 115 may be any of Advanced Video Coding (AVC), High Efficiency Video Coding (HEVC), Versatile Video Coding (VVC), VP9, or AV1 compliant. Video decoder 101 receives input video bitstream 111 and decodes input video bitstream 111 (by implementing the corresponding codec) to generate decoded video as video frames 113. Video frames 113 may include any suitable video frames, video pictures, sequence of video frames, group of pictures, groups of pictures, video data, or the like in any suitable resolution. For example, the video may be video graphics array (VGA), high definition (HD), Full-HD (e.g., 1080p), 4K resolution video, or the like, and the video may include any number of video frames, sequences of video frames, pictures, groups of pictures, or the like. Techniques discussed herein are discussed with respect to video frames and blocks thereof for the sake of clarity of presentation. However, such video frames may be characterized as pictures, etc. and the blocks may be characterized as coding units, largest coding units, macroblocks, coding blocks, etc. For example, a picture or frame of color video data may include a luminance plane or component and two chrominance planes or components at the same or different resolutions with respect to the luminance plane. Video frames 113 (during encode) may be divided into blocks of any size, which contain data corresponding to blocks of pixels. Such blocks may include data from one or more planes or color channels of pixel data…video decoder 101, provides metadata including quantization parameters 112 (QPs) and motion vectors and coding modes 114. Coding modes 114 may include frame level coding information (e.g., frame type including I-frame, P-frame, etc.) and block level coding information including the coding mode of a block and, if applicable, a motion vector and residual information of a block for use as discussed herein. For example, the metadata may include all such information or only information pertinent to selective deep learning network application as discussed herein. For example, in contrast to a typical video decoder that only outputs video frames 113, video decoder 101 generates the metadata via a request of targeted information, via a request of a metadata dump of all such information, or the like). Wang is in the same field of endeavor as Vemulapalli (enhanced resolution of a video using a neural network to upsample frames) and teaches that this will provide deep learning based super-resolution for video that improves super-resolution quality and/or provides acceleration, reduced computational complexity, and memory cost (Paragraph [0004]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vemulapalli with the features wherein the one or more Circuits are to be configured to convert the upsampled frames and the previously-inferred frames to a single channel of a target color space before providing as input to the one or more neural networks, the higher resolution video being generated in a full color space as taught by Wang so as to improve quality and reduce computational complexity as presented by Wang.
Regarding claim 25, Vemulapalli discloses the processor of claim 20.
	Vemulapalli does not explicitly disclose wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine.
	However, Wang teaches enhanced resolution of a video using a neural network to upsample frames (Paragraphs [0022]-[0026]), further comprising wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine (Fig. 1; Paragraph [0026]: System 100 may be implemented via any suitable device such as, for example, a personal computer, a laptop computer, a tablet, a phablet, a smart phone, a digital camera, a gaming console, a wearable device, an all-in-one device, a two-in-one device, or the like or a platform such as a mobile platform or the like; Paragraphs [0037]-[0038]: frame merge module 107 receives super-resolution blocks 116, super-resolution blocks 117, and, if applicable, to super-resolution blocks 115 to generate a frame of output video 118 by merging the super-resolution blocks into a full super-resolution frame of output video 118. In some embodiments, frame merge module 107 also applies filtering such as de-block filtering or other image enhancement techniques. As discussed, output video 118 may be employed in a variety of contexts such as display to a user or use by other applications. Output video 118 may be sent to a display of system 100, to a memory of system 100, to another device via a communications link, etc…techniques discussed herein provide a variety of advantages including low overhead (as all information used to control super-resolution processing is required by the video decoder), acceleration of super-resolution processing using characteristics of the video content (with acceleration depending on content type: with no skip blocks, performance is about the same as application of a network with selectivity, with gaming and screen content performance increase of 45%-90% may be achieved), no conflict with other performance enhancements made via the network, and support for a wide range of codecs). Wang is in the same field of endeavor as Vemulapalli (enhanced resolution of a video using a neural network to upsample frames) and teaches that this will provide deep learning based super-resolution for video that improves super-resolution quality and/or provides acceleration, reduced computational complexity, and memory cost (Paragraph [0004]). Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Vemulapalli with the features wherein the lower resolution video is received from a game engine, and wherein the higher resolution video is output for display to a player during gameplay of a game executing on the game engine as taught by Wang so as to improve quality and reduce computational complexity as presented by Wang.
Regarding claim 27, the limitations of this claim substantially correspond to the limitations of claim 24; thus they are rejected on similar grounds.
Regarding claim 30, the limitations of this claim substantially correspond to the limitations of claim 24; thus they are rejected on similar grounds.
Regarding claim 36, the limitations of this claim substantially correspond to the limitations of claim 24; thus they are rejected on similar grounds.

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MATTHEW D SALVUCCI whose telephone number is (571)270-5748. The examiner can normally be reached M-F: 7:30-4:00PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, XIAO WU can be reached on (571) 272-7761. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MATTHEW SALVUCCI/Primary Examiner, Art Unit 2613