DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(2) the claimed invention was described in a patent issued under section 151, or in an application for patent published or deemed published under section 122(b), in which the patent or application, as the case may be, names another inventor and was effectively filed before the effective filing date of the claimed invention.

Claim(s) 30-32 and 37 is/are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Kalchbrenner et al. (pub. no. 20210027425).
Regarding claim 30, Kalchbrenner discloses a computer system for training a neural network to transform images from a first resolution into a second resolution, the computer system comprising: non-transitory computer readable storage configured to store a plurality of target images, the plurality of images including a first image in the first resolution; and a processing system that includes at least one hardware processor, the processing system configured to: (a) divide the first image into a first plurality of pixel blocks ([0028]; [0031]; [0063] & [0064]), 

(b) split each one of the first plurality of pixel blocks into a plurality of separate output channels to form target output data, (c) generate, based on one of the plurality of separate output channels, a second image that is of the second resolution  ([0048] & [0049]), 

(d) generate, from the second image, a plurality of context blocks, (e) split the plurality of context blocks into a plurality of separate input channels, and (f) train a neural network by using the plurality of separate input channels until convergence of the neural network to the target output data (“During training, the system 100 trains the autoregressive model 102 on a training dataset by adjusting the values of parameters θ of the autoregressive model 102 to maximize log P (x; θ). Since the joint distribution factorizes over pixel groups and scales, the training can be effectively parallelized, i.e., processing of the convolutional neural networks in the autoregressive model 102 can be parallelized during training. Therefore, the convolutional neural networks in the model 102 can be trained in a resource and time-efficient manner.

Once trained, the autoregressive model 102 upscales the low-resolution image 108, for example by iteratively performing the following operations: obtaining a current version of the output image having a current K×K resolution, i.e., the version of the image from the previous iteration, and processing the current version of the output image using a set of CNNs and a predefined grouping and ordering rule that are specific to the current resolution to generate an updated version of the output image having a 2K×2K resolution. The above operations are repeatedly performed until a desirable resolution (e.g., N×N) is obtained”, [0039] – [0040] see also [0060] - [0062]).
Regarding claim 31, Kalchbrenner discloses each of the plurality of separate channels is composed of a plurality of sub-channels ([0043]).
Regarding claim 32, Kalchbrenner discloses each one of the plurality of sub-channels that compose a channel corresponds to a different color value that makes up individual pixels within first image ([0043]).
Regarding claim 37, Kalchbrenner discloses the processing system is further configured to:  select a second plurality of pixel blocks from the generated second image, each one of the plurality of pixel blocks including data for multiple pixels from the generated second image, wherein each one of the plurality of context blocks is based on a corresponding one of the second plurality of pixel blocks ([0039] – [0040] see also [0060] - [0062]).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, 7, 10, 16, 24 and 27-29 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalchbrenner et al. (pub. no. 20210027425) in view of Cox et al. (pub. no. 20210092462).
Regarding claim 1, Kalchbrenner discloses a computer system for converting images into another resolution, the computer system comprising: a processing system that includes at least one hardware processor, the processing system configured to: (a) generate, in connection with execution of a video game, a first image that is at a first resolution (”In particular, to generate the output image 110, the autoregressive model 102 first generates an initial low-resolution image 108 of the output image 110. In some implementations, the autoregressive model 102 can randomly sample the initial low-resolution image 108 from a set of low-resolution image”, [0028]; “In some implementations, instead of generating the low-resolution 108, the autoregressive model 102 can obtain the low-resolution version 108 as an input, e.g., from another system”, [0031]; “ FIG. 4 is a flowchart of an example process for processing a current version of an output image to generate an updated version of the output image. For convenience, the process 400 will be described as being performed by a system of one or more computers located in one or more locations. For example, an image generation system, e.g., the image 

The system obtains a current version of the output image having a current K×K resolution (step 402)”, [0063] & [0064]); 

(b) select a first block of pixels from the first image; (c) generate a first plurality of input channels based on the first block of pixels (“Generally, pixels in the image 200 are grouped in a way that no two adjacent pixels of the image 200 are in the same group, thus allowing adjacent pixels to be generated in parallel, which could greatly accelerate the generation of higher-resolution images during training and inference.

In this example, the image 200 has a resolution of 4 pixels×4 pixels. The image 200 can be divided into disjoint group of pixels using the following rule. To create groups, the image 200 is tiled with 2×2 blocks. The corners of these 2×2 blocks form the four pixel groups at a given scale, i.e., upper-left, upper-right, lower-left, and lower-right. In particular, the upper left corner pixels form group 1 pixels (202). The upper right corner pixels form group 2 pixels (204). The lower left corner pixels form group 3 pixels (206). The lower right corner pixels form group 4 pixels (208). Each group of pixels corresponds to a factor in the joint distribution of Eq. 2”, [0048] & [0049]); 

(d) insert values from each of the first plurality of input channels into a first activation matrix; (e) apply the first activation matrix against a trained neural network to generate a second activation matrix (“A predefined grouping and ordering rule specifies how pixels are grouped in a predetermined way so as to exploit spatial locality at each resolution, i.e., no two adjacent pixels of an image are in the same group. FIG. 2 shows an example grouping and ordering rule in which an image is divided into disjoint group of pixels, with autoregressive structure among the groups, i.e., each group of pixels can be successively generated conditioned on the previously generated groups of pixels.

To upscale an image from a given K×K resolution to 2K×2K resolution, the autoregressive model 102 processes the current version having the given resolution using a first CNN in the set of CNNs and a set of pixel groups specific to the given resolution. The set of pixel groups is formed according to the predefined grouping and ordering rule. The first CNN is configured to generate a first output image, which corresponds to a new pixel group, based on previous pixel groups included in the current image. The autoregressive model 102 then generates an intermediate version (e.g., a K×2K version or a 2K×K version of the output image) by merging the current version and the first output image according to the predefined grouping and ordering rule. The autoregressive model 102 processes the intermediate version using a second CNN in the set of CNNs to generate a second output image in a similar manner. The autoregressive model 102 generates a 2K×2K version by merging the intermediate version and the second output image according to the predefined grouping and ordering rule”, [0041] & [0042]; “In contrast, the autoregressive model 102 described in this description reduces computational costs and accelerates training and inference by factorizing the joint distribution of images into pixel groups factors. This approach can be viewed as a way to merge per-pixel factors, thus cutting some spatial dependencies relied on by existing autoregressive image generation models and 2 mentioned elsewhere) are divided into G groups of T pixels each, the autoregressive model 102 computes the joint distribution of T pixels over an image as a product of the corresponding G factors”, [0037]); “FIG. 3 is a block diagram of an example process for processing a current version of an output image (K×K) to generate an updated version of the output image (2K×2K). For convenience, the process 300 will be described as being performed by a system of one or more computers located in one or more locations. For example, an image generation system, e.g., the image generation system 100 of FIG. 1, appropriately programmed in accordance with this specification, can perform the process 300.

The system obtains a current version 306 of the output image. The current version 306 has a current K×K resolution. In some cases, the current version 306 can be an initial low resolution image of the output image (e.g., the initial low resolution image 108 of FIG. 1) that the system generates using an autoregressive image generation machine learning model. In some other cases, the current version 306 of the output image can be an image generated during the previous iteration of the process 300.

The system processes the current version 306 using a first CNN 302 to generate the first output image 308. The first CNN 302 is configured to receive the current version 306 of the output image and to generate a first output image 308 conditioned on the current version 306. The first output image 308 includes columns of pixels or rows of pixels used by the system to generate a subsequent version of the output image having a higher resolution than the current version.



In some other implementations, the first CNN 302 may include one or more residual neural network layers and one or more convolutional neural network layers. The one or more residual neural network layers extract features of the current version to form a feature map and splits the feature map into spatially contiguous feature blocks 318 which, in implementations, are non-overlapping. The one or more convolutional neural network layers then provides these blocks in parallel (i.e., by generating respective pixel values and/or color values for each of the feature block 318) to form the first output image 308. An example CNN that can be used to provide these blocks is a shallow PixelCNN through which the feature map blocks may be fed. A shallow PixelCNN is a standard PixelCNN with a small number of layers (e.g. less than 5), which can result in faster sampling compared to deeper neural networks.

As an illustrative example, the current version 306 may include the 4 upper left corner pixels that formed group 1 pixels in FIG. 2. The first CNN 302 may use group 1 pixels to generate the first output image 310 that includes group 2 pixels, i.e., the 4 upper-right corner pixels. The first CNN 302 may include one or more residual neural network layers.

The system splits the first output image 308 into K columns of pixels 310. The system then alternates K columns of pixels from the current version 306 with K columns of pixels 310 from the first output image 308 and merges them to create the K×2K version 312.

The system processes the K×2K version 312 using a second convolutional neural network 304 to generate the second output image 314. The second CNN 304 is configured to receive the K×2K version 312 and to generate a second output image 314 that includes rows of pixels to be used to generate the 2K×2K version of the output image. The second CNN 304 may include one or more residual neural network layers.

The system generates the updated output image 316 (i.e. the 2K×2K version) by merging the K×2K version 312 and the second output image 314. In particular, the system generates the 2K×2K image 316 that includes K rows of pixels from the K×2K version 312 and K rows of pixels from the second output image 314 by alternating rows of pixels from the K×2K version with rows of pixels from the second output image”. [0052] – [0060]); 

(f) generate, based on the second activation matrix, a second image that is in a second resolution; wherein (b)-(g) are performed for each of a plurality of pixel blocks that are derived from the first image (“After generating or obtaining the low-resolution image 108 of the output image 110, the autoregressive model 102 upscales the low-resolution version 108 using the sets of CNNs 104 in order to generate the output image 110 having a final desired output resolution (e.g., N pixels×N pixels). For example, in some implementations, the initial low-resolution image 108 has a resolution of 4 pixels×4 pixels and the output image 110 has a resolution of 128 pixels×128 pixels. Generally, the autoregressive model 102 upscales the low-resolution image 108 by generating higher-resolution images following a “coarse-to-fine ordering of pixels” 
Regarding claim 1, it is noted that Kalchbrenner does not explicitly disclose a display device configured to display images of a video game or output, for the video game, the second image to a display.   Cox however,  teaches a display device configured to display images of a video game and outputting the second image to a display(“A thin-cloud system for distributing content, for example, live streaming video content, from a broadcaster to a viewer is provided herein. The computing devices of the broadcaster can provide the multi-bitrate transcoding, of the two or more bitstreams, sent to a file server, which alleviates the need for the file server to encode the streams for a viewer. These multiple streams are received by a file server for provision to one or more viewers. The viewers can receive the streams at one of the two or more bitrates. If the viewer receives the content at a lower bitrate, the viewers can employ a machine learning (ML) co-processor that can operate as an accelerator to improve the inbound content, if that content is provided at a lower bitrate, and thus, a lower resolution. The file server can train and provide the ML models used for the acceleration “, abstract; “The model library 120 can include information or machine learning models, associated with content provided to the file server 108, which may be provided to the viewer 112 to allow the viewer 112 to provide resolution upscaling (e.g., with a hardware-accelerated super-resolution CNN) to improve the visual quality of reduced-resolution video at the viewer 112. For example, the model library 120 can include one or more ML models generated on similar content to that provided to the viewer 112. The provided model from the model library 120 can allow the viewer 112 to compensate for lower resolution content, likely at lower bitrates, either provided due to limitations of the viewer's network or the broadcaster's network. Further, the model library 120 may store metadata 
Exemplary rationales that may support a conclusion of obviousness include combining prior art elements according to known methods to yield predictable results.  Here both Kalchbrenner and Cox are directed to systems using machine learning models to perform image processing.  To use the Kalchbrenner scaling in a thin-cloud system of streaming game content as taught by Cox would be to combine prior art elements to known methods to yield predictable results.  Therefore, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the claimed invention to modify Kalchbrenner so as to use it as part of game streaming service as taught by Cox.  To do so would allow increased quality for reduced bandwidth thereby increasing the perceived entertainment value of the service and reduce operating costs.
Regarding claim 3, the combination of Kalchbrenner and Cox disclose the second image is output to the display in real-time relative to generation of the first image (“ To mitigate lapses in video quality, the viewers can be provided with a type of resolution upscaling (e.g., a hardware-accelerated super-resolution convolutional neural network (CNN)) to improve the visual quality of reduced-resolution video”, Cox: [0031]).
Regarding claims 7 & 10, it is noted that Kalchbrenner does not explicitly disclose that results are maintained within registers of the GPU during processing.   However, a person of ordinary skill in the art as of the effective filing date of the claimed invention would be aware of 
Regarding claim 16, Kalchbrenner discloses each pixel in the first image is represented in the activation matrix by separate RGB values for the color of the corresponding pixel (“In some implementations, each pixel in a higher resolution image generated by the CNNs (e.g., the first output image and second output image in each iteration) has a respective color value for each channel in a set of multiple color channels. For example, the set of color channels may include {red, green, blue} or {cyan, magenta, yellow, black}. The color channels in the set are ordered according to a channel order, for example, RGB order or CMYK order. The first and second convolutional networks take into account the channel order when generating the first output image and the second output image. The process for generating color values for color channels for pixels in the first output image and second output image is described in more detail below with reference to FIG. 4”, [0043]).
Regarding claim 24, the combination of Kalchbrenner and Cox disclose a computer readable storage medium configured to store a plurality of different trained neural networks, with at least a first trained neural network trained to convert images of the first resolution into the second resolution and a second trained neural network trained to convert images of the first resolution into a third resolution that is different from the second resolution (“The set of convolutional neural networks that are specific to the current resolution may comprise a set of convolutional neural networks (CNNs) that includes two or more CNNS that are used to quadruple the resolution. The set of convolutional neural networks that are specific to the current resolution may include: a first convolutional neural network that is configured to receive a first input comprising the current version of the image and to generate a first output image that includes columns of pixels from a K×2K version of the output image, and a second convolutional neural network that is configured to receive a second input comprising the K×2K version of the output image and to generate a second output image that includes rows of pixels from the 2K×2K version of the output image”, Kalchbrenner: [0008]); 

and a transceiver configured to communicate with another computer system to receive the trained neural network therefrom (“The model library 120 can include information or machine learning models, associated with content provided to the file server 108, which may be provided to the viewer 112 to allow the viewer 112 to provide resolution upscaling (e.g., with a hardware-accelerated super-resolution CNN) to improve the visual quality of reduced-resolution video at the viewer 112”, Cox: [0038]).
Regarding claim 27, the combination of Kalchbrenner and Cox disclose first and second processing systems communicating with one another via a computer network,  wherein the first processing system is configured to: perform (a), and communicate data that is based on the first image to the second processing system; and  wherein the second processing system is configured to:  perform (e), (f), and (g), wherein the display is located proximate to the second processing system
Claim 28 is directed to an article of manufacture containing code that is implemented by the system of claim 1 and is rejected for the same reasons as claim 1.
Claim 29 is directed to the method implemented by the system of claim 1 and is rejected for the same reasons as claim 1.
Claims 4, 14 and 110 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalchbrenner et al. (pub. no. 20210027425) in view of view of Cox et al. (pub. no. 20210092462) as applied respectively to claims 1 & 28 above, and further in view of Arthur (pub. no. 20200117981).
Regarding claims 4, 14 and 110, it is noted that Kalchbrenner does not disclose applying a separable block transform using first and second matrices of learned coefficients of the trained neural network or that the block of pixels is based on a hardware accelerated matrix multiplication size of a graphical processing unit.  Arthur however, teaches applying a separable block transform using first and second matrices of learned coefficients of the trained neural network and the block of pixels is based on a hardware accelerated matrix multiplication size of a graphical processing unit (“In a convolution layer, the 6-dimensional weight tensor contains many repeated blocks, since all elements of the same output feature share the same filter weights that are replicated at each output location. The shared filter weights can be described more compactly by a dense 4-dimensional filter tensor F that contains all of the filters that compute output features of the layer, and is indexed by the output feature dimension (output feature k) and 3 filter input dimensions (filter row r, filter column s, filter feature t)”, [0024]; “With reference now to FIG. 1, a neural core according to embodiments of the present disclosure is depicted. A neural core 100 is a tileable computational unit that computes one block of an output tensor. A neural core 100 has M inputs and N outputs. In various embodiments, M=N. To compute an i), applied to the inputs, where the result is added to a bias and run through a nonlinear activation function, σ. For example, to compute a single neuron activation, Y=σ(b+Σxiwi)”, [0056]; “Referring now to FIG. 5, a method of operating a neural network is illustrated according to embodiments of the present disclosure. At 501, an input data tensor is received at a neural network processor comprising a plurality of neural cores. The input data tensor has feature dimensions at an input bit precision. The neural network processor is configured for one or more processor feature dimensions at one or more processor bit precisions. At 502, the input data tensor is transformed from the input bit precision to one of the processor bit precisions. At 503, the input data tensor is divided into a plurality of blocks, each block conforming to one of the processor feature dimensions. At 504, each of the plurality of blocks is provided to one of the plurality of neural cores. At 505, the plurality of neural cores computes output of one or more neural network layers”, [0065]).
Exemplary rationales that may support a conclusion of obviousness include use of known technique to improve similar devices (methods, or products) in the same way. Here both .
Claim 48 is/are rejected under 35 U.S.C. 103 as being unpatentable over Kalchbrenner et al. (pub. no. 20210027425) in view of view of Arthur (pub. no. 20200117981).
Regarding claims 48, it is noted that Kalchbrenner does not disclose applying a separable block transform using first and second matrices of learned coefficients of the trained neural network or that the block of pixels is based on a hardware accelerated matrix multiplication size of a graphical processing unit.  Arthur however, teaches applying a separable block transform using first and second matrices of learned coefficients of the trained neural network and the block of pixels is based on a hardware accelerated matrix multiplication size of a graphical processing unit ([0024]; [0031]; [0056]; [0065]).
Exemplary rationales that may support a conclusion of obviousness include use of known technique to improve similar devices (methods, or products) in the same way. Here both Kalchbrenner and Arthur are directed to systems of neural networks for computation.  To use the separable block transforms of Arthur in Kalchbrenner invention would be to use a known technique to improve a similar device in the same way.   Therefore, it would have been obvious to a person having ordinary skill in the art as of the effective filing date of the claimed invention to modify Kalchbrenner to use the separable block transforms of Arthur.  To do so would allow parallel processing for faster processing as suggested by Arthur ([0065]).

Allowable Subject Matter
Claims 5, 20, 38, 41, 49, 50 and 111 are objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LAWRENCE STEFAN GALKA whose telephone number is (571)270-1386. The examiner can normally be reached M-F 6-9 & 12-5.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, David Lewis can be reached on 571-272-7673. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) 





/LAWRENCE S GALKA/Primary Examiner, Art Unit 3715