PNG
    media_image1.png
    340
    340
    media_image1.png
    Greyscale
United States Patent and Trademark Office    
        
            
                                
            
        
    

Commissioner for Patents
United States Patent and Trademark Office
P.O. Box 1450
Alexandria, VA 22313-1450
www.uspto.gov











BEFORE THE PATENT TRIAL AND APPEAL BOARD


Application Number: 15/439,893
Filing Date: 22 Feb 2017
Appellant(s): Rippel et al.



__________________
Jae Yeon Baek 
Reg. No. 78,258
For Appellant


EXAMINER’S ANSWER





This is in response to the appeal brief filed 3/17/21.

(1) Grounds of Rejection to be Reviewed on Appeal

Every ground of rejection set forth in the Office action dated 11/17/20 from which the appeal is taken is being maintained by the examiner except for the grounds of rejection (if any) listed under the subheading “WITHDRAWN REJECTIONS.”  New grounds of rejection (if any) are provided under the subheading “NEW GROUNDS OF REJECTION.”

The following ground(s) of rejection are applicable to the appealed claims.


1.	Claims 1, 12 and 27 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that inventor(s), at the time the application was filed, had possession of the claimed invention. MPEP 2161.01(I) and 2163.05(I)(3)(ii) give guidance. Generic claim language in the original disclosure does not satisfy the written description requirement if it fails to support the scope of the genus claimed. 

In the instant case: applicant does not disclose an algorithm for achieving the functionality “where the encoder portion is trained in conjunction with a decoder portion of the autoencoder, the decoder portion coupled to receive the combined tensor". The statement that one could achieve a functional result, is not sufficient to describe an algorithm for achieving it. Notably, the method disclosed in specification para. 31-32 and .


Claims 1-6, 8, 12-17, 19 and 23-27 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (U.S. Pub. No. 20190205606 A1), in view of Wang (U.S. Pub. No. 20180139458 A1), further in view of Mathieu (U.S. Pub. No. 20180137389 A1) and 
Suter (Tensor Approximation Multiresolution Hierarchy for Interactive Volume Visualization - DOI: 10.1111/cgf.12102 - Eurographics Conference on Visualization (EuroVis) 2013, Volume 32 (2013), Number 3).

Regarding to claim 1 and 12:

1. Zhou teach a non-transitory computer-readable storage medium comprising code that, when executed by a processor, causes the processor to perform steps including: (Zhou [0036] FIG. 1 The computer system 100 can be implemented using any type of computer device and includes computer processors, memory units, storage devices, computer software, and other computer components. Claim 23. A non-transitory computer readable medium storing computer program instructions for autonomous artificial intelligence based medical image segmentation, the computer program instructions when executed by a processor perform operations comprising)
generating a combined tensor (Zhou [0102] a joint tensor representation is constructed to combine the original image and the partial segmentation results, and deep learning is used to learn a mapping between this joint tensor representation and the target segmentation mask) for the input image by applying an encoder portion (Zhou FIG. 10 [0090] ) including one or more layers of a neural network (Zhou [0104] FIG. 12, in both CED_Init 1204 and CE_PI 1206, five convolutional layers and followed by five de-convolutional layers) to the plurality of scaled images, (Zhou [0090] At step 906, a deep image-to-image network (DI2IN) is trained based on the multi-scale ground truths generated for the training images)
where the encoder portion is trained (Zhou FIG. 9. [0103] FIG. 12 illustrates a framework for deep learning partial inference based medical image segmentation according to an embodiment of the present invention. In the first stage (Stage 1), a first deep convolutional encoder decoder (CED) 1204 is used to learn [autoencoder] a mapping from an input medical image 1202 (e.g., MR image) to a segmentation mask) in conjunction with a decoder portion an autoencoding process, (Zhou [0090] For each training image the raw image 1002 is input to the encoder 1004. The output of the encoder 1004 is input to each of the decoders 1006a, 1006b, and 1006c, and each decoder 1006a, 1006b, and 1006c estimates a respective one of the multi-scale ground truth probability functions 1008a, 1008b, and 1008c. The loss function to be minimized in the training of the DI2IN 1000 can be considered as a summation of the loss from all of the decoders 1006a, 1006b, and 1006c) the decoder portion coupled to receive the combined tensor (Zhou [0102] a joint tensor representation is constructed to combine the original image and the partial segmentation results, and deep learning is 

Zhou do not explicitly teach receiving the input image to be compressed; obtaining a plurality of scaled images, each scaled image corresponding to the input image at a different scale; where the encoder portion includes a parameterized function for each scaled image, and where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto, where each intermediate tensor for a scaled image includes information extracted from the scale of the scaled image; mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors; and combining the plurality of tensors for the set of scaled images to generate a combined tensor for the input image, and generate a reconstructed version of the input image; and compressing the combined tensor into a code.

However Wang teach receiving the input image to be compressed; (Wang [0347] FIG. 28 illustrates the method steps of pre-processing visual data frames prior to encoding [compressed] for transmission, and decoding received data then post-processing to obtain decoded visual data frames. Wang [0369] the original video data 70 is then split into single full-resolution frames at step 80 (or step 190), i.e. into a sequence of images at the full resolution and/or quality of the original video data 70. For some video codecs, this will involve "uncompressing" or restoring the video data as, for 
where each intermediate tensor for a scaled image includes information extracted from the scale of the scaled image; (Wang [0404] Referring to FIG. 7, there is shown an efficient sub-pixel convolutional neural network (ESPCN) 700 having a low-resolution input image 710 with two feature map extraction layers 720, 730 built with convolutional neural networks and a sub-pixel convolution layer 740 that aggregates the feature maps [720, 730, 740 intermediate tensor] from low-resolution space and builds the super resolution image 750 in a single step. [0405] as shown in FIG. 7, the high-resolution data is super resolved from the low-resolution feature maps only at the very end of the network)
and generate a reconstructed version of the input image; (Wang Fig. 11 1140 [0029] Super resolution techniques allow for the creation of one or more high-resolution images, typically from one or more low-resolution images. Typically, super resolution is applied to a set or series of low-resolution images of the same scene and the technique attempts to reconstruct a higher-resolution image of the same scene from these images. [0112] the algorithms used are hierarchical algorithms. It should be noted that algorithms could also be referred to as models, representations, parameters or functions. In some of these embodiments, hierarchical algorithms can enable substantially accurate reconstruction of visual data, e.g. produce a higher quality high-resolution video from the low-resolution video that is transmitted, for example where quality can be measured by a low error rate in comparison to the original high-resolution video)
and compressing the combined tensor into a code. (Wang [0165] this allows for the individual sections to be down-sampled thus reducing the size of the visual data, thereby allowing for lower-quality sections to be transmitted as re-encoded visual data in the original or optionally a more optimal codec but at a lower quality. [0406] video compression, the down sampling operation can be deterministic and known: to produce the low resolution image from the high resolution image [image coding] … both the low and high resolution image have C colour channels, thus can be represented as real-valued tensors [combined tensor] of size H.times.W.times.C and rH.times.rW.times.C respectively)

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang in video/camera technology. One would be motivated to do so, to incorporate generate a reconstructed version of the input image. This will improve the coding efficiency.

The combination of Zhou and Wang do not explicitly teach obtaining a plurality of scaled images, each scaled image corresponding to the input image at a different scale; where the encoder portion includes a parameterized function for each scaled image, and where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto, mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors; and combining the plurality of tensors for the set of scaled images to generate a combined tensor for the input image.

However Mathieu teach where the encoder portion includes (Mathieu [0021] In particular embodiments, unsupervised learning may be achieved by use of a convolutional model that may be trained to predict sets of future possible actions, or by use of a convolutional network that may be trained to learn to linearize motion in the code space. Besides unsupervised learning, a video predictive system may find applications in robotics, video compression [encoder]) a parameterized function for each scaled image, (Mathieu [0026] FIG. 2 Let u.sub.k be the upscaling operator [parameterized function] toward size s.sub.k.)
mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors; and (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2 [intermediate tensors], which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. [target output
dimensionality] this results in output frame 240)
combining the plurality of tensors for the set of scaled images to generate a combined tensor for the input image, (Mathieu [0027] FIG. 2, at step 210,
X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at
size k/2, which outputs a frame 220, which is input at step 230, along with
X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. This results in output
frame 240 [combining])

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang and Mathieu in video/camera technology. One would be motivated to do so, to incorporate mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors. This will improve the coding efficiency.

The combination of Zhou, Wang and Mathieu do not explicitly teach obtaining a plurality of scaled images, each scaled image corresponding to the input image at a different scale; and where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto.

However Suter teach obtaining a plurality of scaled images, each scaled image corresponding to the input image at a different scale; (Suter Fig. 9 page 152 col 1 para 3 we use the TA framework [KB09], as previously used individually for multiscale volume visualization [SZP10] and for multiresolution volume rendering [SIGM_11]. So far, no combined tensor-based multiscale and multiresolution model has been proposed)
and where the generating comprises: (Suter page 153 col. 2 para 2 Fig. 7 shows the factor matrix averaging as used for a hierarchical tensor representation and its effects on the visual reconstruction)
for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto, (Suter Figure 4: Factor-matrix subsampling by pair-wise row averaging generates a mipmapped factor matrix hierarchy. Suter page 153 col. 2 para 1-2 rows correspond to halving the reconstructed volume resolution. This downsampling [parameterized function] of factor matrices is illustrated in Fig. 4 and corresponds to the principle of mipmapping)

It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang, Mathieu and Suter in video/camera technology. One would be motivated to do so, to incorporate for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto. This will improve the coding efficiency.

Regarding to claim 2 and 13:

2. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein obtaining the plurality of scaled images comprises applying one or more downsampling operators to the input image and wherein mapping the plurality of intermediate tensors comprises applying, for each intermediate tensor, another parameterized function for the intermediate tensor to align the extracted information for the intermediate tensor to the target output dimensionality.

However Wang teach wherein obtaining the plurality of scaled images comprises applying one or more downsampling operators to the input image (Wang [0406] The task of the example based model, in some embodiments a single image super resolution network, is to estimate a high resolution image given a low resolution image that is downscaled or downsampled from a corresponding high resolution image)

However Mathieu teach and wherein mapping the plurality of intermediate tensors comprises applying, for each intermediate tensor, another parameterized function for the intermediate tensor to align the extracted information for the intermediate tensor to the target output dimensionality. (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2 [intermediate tensors], which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1 [first parameterized function], X.sub.k.sup.2 [second parameterized function], into network G'.sub.k at size k. [target output dimensionality] This results in output frame 240)

Regarding to claim 3 and 14:

3. Zhou teach the computer-implemented method of claim 2, Zhou do not explicitly teach wherein the downsampling operators are trained to maximize reconstruction at a given compression rate.

wherein the downsampling operators are trained (Wang [0129] down-sampling is used to reduce the quality of one or more sections of higher-quality visual data to one or more sections of lower-quality visual data. [0148] by transmitting a section of lower-quality visual data over a network together with an example based model to aid reconstruction of high-quality visual data. [0474] Embodiments can use dictionary learning reconstruction models or convolutional neural network reconstruction models for up-scaling, or a mixture of these two techniques. In some embodiments, a library of reconstruction models is stored that can be generated from example, or training, video data where both the original and reduced-resolution video can be compared) to maximize reconstruction (Wang [0012] These compression techniques make a trade-off between the quality and the bitrate of video data streams when providing inter-frame and intra-frame compression, but the amount of compression possible is largely dependent on the image resolution of each frame and the complexity of the image sequences. [0029] super resolution is applied to a set or series of low-resolution images of the same scene and the technique attempts to reconstruct a higher-resolution image of the same scene from these images) at a given compression rate. (Wang [0069] knowledge of the original visual data can allow the hierarchical algorithm to be trained (and/or developed) based on knowledge of both the original visual data and the low-quality visual data in order to train a hierarchical algorithm to substantially reproduce the original visual data from the low-quality visual data. [0148] quality can relate to the resolution of visual data and/or other attributes such as a higher or lower frame rate)

Regarding to claim 4 and 15:

4. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein the parameterized functions for the set of scaled images are trained to maximize reconstruction quality at a given compression rate.

However Wang teach wherein the parameterized functions (Wang [0129] down-sampling [parameterized functions] is used to reduce the quality of one or more sections of higher-quality visual data to one or more sections of lower-quality visual data. [0148] by transmitting a section of lower-quality visual data over a network together with an example based model to aid reconstruction of high-quality visual data. [0474] Embodiments can use dictionary learning reconstruction models or convolutional neural network reconstruction models for up-scaling, or a mixture of these two techniques. In some embodiments, a library of reconstruction models is stored that can be generated from example, or training, video data where both the original and reduced-resolution video can be compared) for the set of scaled images are trained to maximize reconstruction quality at a given compression rate. (Wang [0012] these compression techniques make a trade-off between the quality and the bitrate of video data streams. [0069] knowledge of the original visual data can allow the hierarchical algorithm [parameterized functions] to be trained (and/or developed) based on knowledge of both the original visual data and the low-quality visual data in order to train a hierarchical algorithm to substantially reproduce the original visual data from the 

Regarding to claim 5 and 16:

5. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein the input image is a residual frame of a video predicted from a plurality of video frames of the video.

However Wang teach wherein the input image is a residual frame of a video predicted from a plurality of video frames of the video. (Wang [0425] the training process for the example based models takes place exclusively for the reconstruction of the high-frequency components of the higher-resolution section of video. The results may then be added as a residue to a section of video reconstructed using bi-cubic interpolation)

Regarding to claim 6 and 17:

6. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein the parameterized functions for the set of scaled images are machine-learned.

wherein the parameterized functions (Wang [0129] down-sampling [parameterized functions] is used to reduce the quality of one or more sections of higher-quality visual data to one or more sections of lower-quality visual data. [0148] by transmitting a section of lower-quality visual data over a network together with an example based model to aid reconstruction of high-quality visual data. [0474] Embodiments can use dictionary learning reconstruction models or convolutional neural network reconstruction models for up-scaling, or a mixture of these two techniques. In some embodiments, a library of reconstruction models is stored that can be generated from example, or training, video data where both the original and reduced-resolution video can be compared) for the set of scaled images are machine-learned. (Wang [0069] knowledge of the original visual data can allow the hierarchical algorithm to be trained (and/or developed) based on knowledge of both the original visual data and the low-quality visual data in order to train a hierarchical algorithm to substantially reproduce the original visual data from the low-quality visual data. [0157] the example based model comprises any of: a generative model; a non-linear hierarchical algorithm; or a convolutional neural network; or a recurrent neural network; or a deep belief network; or a dictionary learning algorithm; or a parameter; or a mapping function)

Regarding to claim 8 and 19:

8. Zhou teach the computer-implemented method of claim 1 Zhou do not explicitly teach wherein combining the plurality of tensors further comprises: combining the plurality of tensors into a common space; and transforming the combined tensor in the common space to identify structures across scales and to obtain feature coefficients for compression.

However Wang teach and transforming the combined tensor in the common space to identify structures across scales (Wang [0406] a single image super resolution network, is to estimate a high resolution image given a low resolution image that is downscaled or downsampled from a corresponding high resolution image)
and to obtain feature coefficients for compression. (Wang [0116] libraries can be provided at both nodes, and/or in centralized or distributed databases, and optionally can use common or synchronized reference identifiers for the same algorithms)

However Mathieu teach wherein combining the plurality of tensors further comprises: combining the plurality of tensors into a common space; (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2, which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k [common space]. This results in output frame 240 [combining])

Regarding to claim 23 and 25:

23. Zhou teach the method of claim 1, Zhou do not explicitly teach wherein obtaining the plurality of scaled images comprises downsampling the input image at one or more different scales. 

However Mathieu teach wherein obtaining the plurality of scaled images comprises downsampling the input image at one or more different scales. (Mathieu [0026] Let X.sub.k.sup.i, Y.sub.k.sup.i denote the downscaled versions of X.sup.i and Y.sup.i of size s.sub.k, and G'.sub.k be a network that learns to predict Y.sub.k-u.sub.k (Y.sub.k-1) from X.sub.k and a coarse guess of Y.sub.k. Particular embodiments may recursively define the network G.sub.k, that makes a prediction .sub.k of size s.sub.k, by: 
.sub.k=G.sub.k(X)=u.sub.k( .sub.k-1)+G'.sub.k(X.sub.k,u.sub.k( .sub.k-1)) (Equation 2))

Regarding to claim 24 and 26:

24. Zhou teach the method of claim 1, Zhou do not explicitly teach wherein obtaining the plurality of scaled images comprises adding the input image to the plurality of scaled images. 

However Mathieu teach wherein obtaining the plurality of scaled images comprises adding the input image to the plurality of scaled images. (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2, which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. This results in output frame 240 [combining])

Regarding to claim 27:

27. Zhou teach an encoder stored on a non-transitory computer-readable storage medium, wherein the encoder is manufactured by a process comprising: (Zhou [0036] FIG. 1 The computer system 100 can be implemented using any type of computer device and includes computer processors, memory units, storage devices, computer software, and other computer components. Claim 23. A nontransitory computer readable medium storing computer program instructions for autonomous artificial intelligence based medical image segmentation, the computer program instructions when executed by a processor perform operations comprising)
accessing a machine-learned model including: (Zhou [0039] The segmentation algorithms stored in the segmentation algorithm database 108 can include a plurality of deep learning based medical image segmentation methods, each of which including a respective trained deep neural network architecture for performing medical image segmentation. For example, the segmentation algorithms can include the deep learning based segmentation algorithms described below, including segmentation using a deep neural network (DNN) that integrates shape priors through joint training, non-rigid shape segmentation method using deep reinforcement learning, segmentation using deep learning based partial inference modeling under domain shift, segmentation using a deep-image-to-image network and multi-scale probability maps, and active shape model based segmentation using a recurrent neural network (RNN))
and generate a combined tensor for the input image, (Zhou [0102] a joint tensor representation is constructed to combine the original image and the partial  
a decoder portion coupled to receive the combined tensor for the input image and (Zhou [0090] For each training image the raw image 1002 is input to the encoder 1004. The output of the encoder 1004 is input to each of the decoders 1006a, 1006b, and 1006c, and each decoder 1006a, 1006b, and 1006c estimates a respective one of the multi-scale ground truth probability functions 1008a, 1008b, and 1008c. The loss function to be minimized in the training of the DI2IN 1000 can be considered as a summation of the loss from all of the decoders 1006a, 1006b, and 1006c. Zhou [0102] a joint tensor representation is constructed to combine the original image and the partial segmentation results, and deep learning is used to learn a mapping between this joint tensor representation and the target segmentation mask) 
generating a combined tensor (Zhou [0102] a joint tensor representation is
constructed to combine the original image and the partial segmentation results, and
deep learning is used to learn a mapping between this joint tensor representation and
the target segmentation mask) for the training image by applying the encoder portion of the machine-learned model (Zhou FIG. 9. [0103] FIG. 12 illustrates a framework for deep learning partial inference based medical image segmentation according to an embodiment of the present invention. As shown in FIG. 12, the segmentation framework 1200 performs medical image segmentation in a two stage workflow. In the first stage (Stage 1), a first deep convolutional encoder decoder (CED) 1204 is used to learn [autoencoder] a mapping from an input medical image 1202 (e.g., MR image) to a segmentation mask) to the plurality of scaled images of the training image, (Zhou [0090] At step 906, a deep image-to-image network (DI2IN) is trained based on the multi-scale ground truths generated for the training images)
determining one or more error terms from a loss function that indicates a (Zhou [0060] Network 1 inputs a medical image and estimates a segmentation mask, and the loss function for Network 1 (Loss1) is an error between the estimated segmentation masks and the ground truth segmentation masks over the set of training samples) difference between the training image and the reconstructed version, and (Zhou [0059] At step 306, a deep neural network (DNN) architecture is jointly trained based on the ground truth segmentations (segmentation masks) [reconstructed version] and the priors generated for the training images)
updating the set of parameters in the encoder portion of the machine-learned
model by backpropagating the one or more error terms obtained from the loss function; and (Zhou FIG. 4, [0059] a DNN architecture 400 includes multiple component networks (i=1, 2, . . . N) and a fusion network (i=0), and the weights w.sub.i of the component networks and the fusion network are learned using joint training to minimize a final loss function that is a combination of the individual loss functions of all the networks: loss.sub.final=.SIGMA..sub.iw.sub.iloss.sub.i. Through error back-propagation during joint training, these component networks will influence and regularize each other)
storing the set of parameters of the encoder portion of the machine-learned model (Zhou [0039] The segmentation algorithms stored in the segmentation algorithm database 108 can include a plurality of deep learning based medical image segmentation methods, each of which including a respective trained deep neural as a set of parameters of the encoder. 
(Zhou Fig. 10 [0090] For each training image [parameters] the raw image 1002 is input to the encoder 1004. The output of the encoder 1004 is input to each of the decoders 1006a, 1006b, and 1006c, and each decoder 1006a, 1006b, and 1006c estimates a respective one of the multi-scale ground truth probability functions 1008a, 1008b, and 1008c)

Zhou do not explicitly teach an encoder portion coupled to receive a plurality of scaled images of an input image, where the plurality of scaled images corresponds to the input image at a plurality of scales, and where the encoder portion includes a set of parameters for each scale, and generate a reconstructed version of the input image; repeatedly performing, for each training image in a set of training images: obtaining a plurality of scaled images of the training image, where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the set of parameters for the scale of the scaled image thereto, where each intermediate tensor for a scaled image includes information extracted from the corresponding scale of the training image, mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors for the training image, combining the plurality of tensors to generate the combined tensor for the training image, generating a reconstructed version of the training image by applying the decoder portion of the machine-learned model to the combined tensor for the training image.

However Wang teach generate a reconstructed version of the input image; (Wang Fig. 11 1140 [0029] Super resolution techniques allow for the creation of one or more high-resolution images, typically from one or more low-resolution images. Typically, super resolution is applied to a set or series of low-resolution images of the same scene and the technique attempts to reconstruct a higher-resolution image of the same scene from these images. [0112] the algorithms used are hierarchical algorithms. It should be noted that algorithms could also be referred to as models, representations, parameters or functions. In some of these embodiments, hierarchical algorithms can enable substantially accurate reconstruction of visual data, e.g. produce a higher quality high-resolution video from the low-resolution video that is transmitted, for example where quality can be measured by a low error rate in comparison to the original high-resolution video)
repeatedly performing, for each training image in a set of training images: (Wang FIG. 9 [0492] These metrics are then used to select a pre-trained model from a library 942 in step 930. The selected pre-trained model is then developed in step 940 so as to 
where each intermediate tensor for a scaled image includes information extracted from the corresponding scale of the training image, (Wang [0404] Referring to FIG. 7, there is shown an efficient sub-pixel convolutional neural network (ESPCN) 700 having a low-resolution input image 710 with two feature map extraction layers 720, 730 built with convolutional neural networks and a sub-pixel convolution layer 740 that
aggregates the feature maps [720, 730, 740 intermediate tensor] from low-resolution
space and builds the super resolution image 750 in a single step. [0405] as shown in
FIG. 7, the high-resolution data is super resolved from the low-resolution feature maps
only at the very end of the network)
generating a reconstructed version of the training image by applying the decoder portion of the machine-learned model (Wang [0168] where this further analysis can be performed once the visual data has been decoded, and in some embodiments this analysis can allow a reduction of the number of models considered for use with the visual data for enhancement or as starting points for training a model. [0518] FIG. 13 shows a further method of decoding the received information at the second node to reproduce substantially the higher resolution video)

The combination of Zhou and Wang do not explicitly teach an encoder portion coupled to receive a plurality of scaled images of an input image, where the plurality of scaled images corresponds to the input image at a plurality of scales, and where the encoder portion includes a set of parameters for each scale, and obtaining a plurality of scaled images of the training image,  where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the set of parameters for the scale of the scaled image thereto, mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors for the training image, and combining the plurality of tensors to generate the combined tensor for the training image, to the combined tensor for the training image.

However Mathieu teach and where the encoder portion includes (Mathieu [0021] In particular embodiments, unsupervised learning may be achieved by use of a convolutional model that may be trained to predict sets of future possible actions, or by use of a convolutional network that may be trained to learn to linearize motion in the code space. Besides unsupervised learning, a video predictive system may find applications in robotics, video compression [encoder]) a set of parameters for each scale, and (Mathieu [0026] FIG. 2 Let u.sub.k be the upscaling operator [parameterized function] toward size s.sub.k.)
mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2 [intermediate tensors], which outputs a frame 220, which is input at step 230,
along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. [target output
 for the training image, and (Mathieu [0012] FIGS. 4A-4F illustrate video clips from Sport1m for training video predictions)
combining the plurality of tensors to generate the combined tensor (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2, which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. This results in output frame 240 [combining]) for the training image, (Mathieu [0012] FIGS. 4A-4F illustrate video clips from Sport1m for training video predictions)
to the combined tensor (Mathieu [0027] FIG. 2, at step 210, X.sub.k/2.sup.1, X.sub.k/2.sup.1, . . . X.sub.k/2.sup.i are input into network G'.sub.k/2 at size k/2, which outputs a frame 220, which is input at step 230, along with X.sub.k.sup.1, X.sub.k.sup.2, into network G'.sub.k at size k. This results in output frame 240 [combining]) for the training image, (Mathieu [0012] FIGS. 4A-4F illustrate video clips from Sport1m for training video predictions)

The combination of Zhou, Wang and Mathieu do not explicitly teach an encoder portion coupled to receive a plurality of scaled images of an input image, where the plurality of scaled images corresponds to the input image at a plurality of scales, obtaining a plurality of scaled images of the training image, where the generating comprises: for each scaled image, generating an intermediate tensor for the scaled image by applying the set of parameters for the scale of the scaled image thereto.

an encoder portion (Suter  page 156 col. 2 para 5 Incorporating quantization, the storage costs are affected differently. Both store 8-bit core tensor values (logarithmic encoding). [SIGM_11] use a 16-bit linear factor matrix encoding) coupled to receive a plurality of scaled images of an input image, (Suter Fig. 9 page 152 col 1 para 3 we use the TA framework [KB09], as previously used individually for multiscale volume visualization [SZP10] and for multiresolution volume rendering [SIGM_11]. So far, no combined tensor-based multiscale and multiresolution model has been proposed)
where the plurality of scaled images corresponds to the input image at a plurality of scales, (Suter Fig. 9 page 152 col 1 para 3 we use the TA framework [KB09], as previously used individually for multiscale volume visualization [SZP10] and for multiresolution volume rendering [SIGM_11]. So far, no combined tensor-based multiscale and multiresolution model has been proposed.)
obtaining a plurality of scaled images (Suter Fig. 9 page 152 col 1 para 3 we use the TA framework [KB09], as previously used individually for multiscale volume visualization
[SZP10] and for multiresolution volume rendering [SIGM_11]. So far, no combined
tensor-based multiscale and multiresolution model has been proposed) of the training image, (Suter page 151 col. 2 para 1-2 higher-order tensor approximation (TA) (see [KB09]) methods derive learned basis decompositions, which may capture more compact data-specific structures and patterns)
where the generating comprises: (Suter page 153 col. 2 para 2 Fig. 7 shows the
factor matrix averaging as used for a hierarchical tensor representation and its effects
 for each scaled image, generating an intermediate tensor for the scaled image by applying the set of parameters for the scale of the scaled image thereto, (Suter Figure 4: Factor-matrix subsampling by pair-wise row averaging generates a mipmapped factor matrix hierarchy. Suter page 153 col. 2 para 1-2 rows correspond to halving the reconstructed volume resolution. This downsampling [parameterized function] of factor matrices is illustrated in Fig. 4 and corresponds to the principle of mipmapping)

Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (U.S. Pub. No. 20190205606 A1), in view of Wang (U.S. Pub. No. 20180139458 A1), further in view of Mathieu (U.S. Pub. No. 20180137389 A1), Suter (Tensor Approximation Multiresolution Hierarchy for Interactive Volume Visualization - DOI: 10.1111/cgf.12102 - Eurographics Conference on Visualization (EuroVis) 2013, Volume 32 (2013), Number 3) and Taubman (U.S. Pub. No. 6778709 B1).

Regarding to claim 7 and 18:

7. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein compressing the combined tensor comprises: quantizing information included in the combined tensor; decomposing each of the quantized coefficients into a plurality of bitplanes; applying a trained adaptive arithmetic coder model to the decomposed plurality of bitplanes to generate compressed codes of the input image.

However Wang teach wherein compressing the combined tensor comprises: quantizing information included in the combined tensor; (Wang [0076] the one or more sections of lower-quality visual data are generated from the one or more sections of higher-quality visual data using a process comprising compression and/or quantization) 
decomposing each of the quantized coefficients (Wang [0076] the one or more sections of lower-quality visual data are generated from the one or more sections of higher-quality visual data using a process comprising compression and/or quantization) into a plurality of bitplanes; (Wang [0404] Referring to FIG. 7, there is shown an efficient sub-pixel convolutional neural network (ESPCN) 700 having a low-resolution input image 710 with two feature map extraction layers 720 [bitplanes], 730 built with convolutional neural networks and a sub-pixel convolution layer 740 that aggregates the feature maps from low-resolution space and builds the super resolution image 750 in a single step)

However Taubman teach applying a trained adaptive arithmetic coder model to the decomposed plurality of bitplanes to generate compressed codes of the input image. (Taubman col. 5 line 52-60 Sub-blocks are formed until the adaptive arithmetic coding has had an opportunity to learn the probability with which insignificant symbols become significant in any given bit-plane. For each bit-plane, information concerning sub-blocks that contain one or more significant samples is encoded first; all other sub-blocks are by-passed in the remaining coding phases for that bit-plane)

The motivation for combining Zhou, Wang, Mathieu and Suter as set forth in claim 1 is equally applicable to claim 7. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang, Mathieu. Suter and Taubman in video/camera technology. One would be motivated to do so, to incorporate applying a trained adaptive arithmetic coder model to the decomposed plurality of bitplanes to generate compressed codes of the input image. This will improve the coding efficiency.

Claims 9-11 and 20-22 are rejected under 35 U.S.C. 103 as being unpatentable over Zhou (U.S. Pub. No. 20190205606 A1), in view of Wang (U.S. Pub. No. 20180139458 A1), further in view of Mathieu (U.S. Pub. No. 20180137389 A1), Suter (Tensor Approximation Multiresolution Hierarchy for Interactive Volume Visualization - DOI: 10.1111/cgf.12102 - Eurographics Conference on Visualization (EuroVis) 2013, Volume 32 (2013), Number 3) and Rodriguez (U.S. Pub. No. 20170083792 A1).

Regarding to claim 9 and 20:

9. Zhou teach the computer-implemented method of claim 1, Zhou do not explicitly teach wherein the parameterized function for each scaled image is a portion of a neural network trained based on backpropagated loss between the input image and a previously reconstructed input image.

wherein the parameterized function for each scaled image (Mathieu [0041] Generative model G architecture is presented in Table 1 shows different model for different scale. [0040] multi-scale architectures may employed in generating the results discussed herein. The baseline models may use l.sub.1 and l.sub.2 losses. The GDL-l.sub.1 (respectively GDL-l.sub.2) model may use a combination of the GDL with .alpha.=1 (respectively .alpha.=2) and p=1 (respectively p=2) loss; the relative weights .lamda..sub.gdl and .lamda..sub.lp are both 1. The adversarial (Adv) model uses the adversarial loss, with p=2 weighted by .lamda..sub.adv=0.05 and .lamda..sub.lp=1) is a portion of a neural network (Mathieu [0019] train a convolutional neural network ( CNN) to generate future frames given an input sequence. As an example and not by way of limitation, to deal with blurry predictions obtained from a standard Mean Squared Error (MSE) loss function) 

However Rodriguez teach trained based on backpropagated loss between the input image and a previously reconstructed input image. (Rodriguez [0088] The vector of these derivatives of the loss is backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the Rodriguez using the new loss to update the weights of C5, then to C4, and so on. [0072] Given the neural network model 42, as described above with reference to FIG. 3, then at S106, each of the N images 38 of the annotated set 40, [I.sub.i].sub.i=1.sup.N, is encoded by inputting the RGB image I.sub.i of fixed pixel dimensions to the network, and computing t.sub.i=p.sub.5(I.sub.i), i.e., a real-valued vector 96 containing all the output values of the pool5 layer.)

The motivation for combining Zhou, Wang, Mathieu and Suter as set forth in claim 1 is equally applicable to claim 9. It would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang, Mathieu, Suter and Rodriguez in video/camera technology. One would be motivated to do so, to incorporate wherein the trained feature model is a neural network trained based on backpropagated loss between the input image and a previously reconstructed input image. This will improve the coding efficiency.

Regarding to claim 10 and 21:

10. Zhou teach the computer-implemented method of claim 9, Zhou do not explicitly teach wherein the backpropagated loss is calculated based on a quality metric.

However Rodriguez teach wherein the backpropagated loss (Rodriguez [0088] The vector of these derivatives of the loss is backpropagated through the model by first multiplying the vector of the loss 104 by the layer 98 weights, and computing the derivative of the Rodriguez using the new loss to update the weights of C5, then to C4, and so on) is calculated based on a quality metric. (Rodriguez [0079] this results in a matrix where the annotated images with a higher bounding box overlap are likely to be considered more similar in the new feature space. [0078] a metric learning algorithm 50 and a set of annotated training images, analogous to images 38, are used to obtain an 

Regarding to claim 11 and 22:

11. Zhou teach the computer-implemented method of claim 10, Zhou do not explicitly teach wherein the quality metric is one of peak signal-to-noise ratio, structural similarity index, or multi-scale structural similarity index.

However Rodriguez teach wherein the quality metric is one of peak signal-to-noise ratio, structural similarity index, or multi-scale structural similarity index. (Rodriguez [0070] the third, a model as shown in FIG. 4. In each of the methods, the similarity is computed, at S112, between a representation 46 of the query image and representations 48 of the annotated images, and the bounding box annotations 102 of the top-ranked annotated images 56 are then used (at S114) to compute a bounding box 34 for the query image 12, as graphically illustrated in FIG. 5.)

(2) Response to Argument

In essence the Appellants argue the following points, and each point is addressed individually by the examiner.


Appellant argued in pages 4-6:

	Appellant argued in pages 4-6 related to limitation “where the encoder portion is trained in conjunction with a decoder portion of the autoencoder, the decoder portion coupled to receive the combined tensor” of claim 1, 12 and 27 for the rejection under §112.	

	Office respectfully disagrees for the following reason:

Examiner disagrees, because appellant pointed specification PGPUB paragraphs [0005], [0006], [0008], [0036], [0046], [0053], [0058-0062] and [0064] for this one simple limitation. None of these paragraphs or anywhere else in the specification disclosed the term “combined tensor” or the decoder portion coupled to receive the combined tensor or any other form of tensor. Examiner notes that term “combined tensor” is not well known in the art. Specification paragraph 46 disclose: as depicted  in FIG. 3A, the feature extraction module 205 generates a summed tensor 340, hereafter denoted as tensor y∈R.sup.C×H×W, which is quantized and encoded. Examiner notes decoder is not receiving “combined tensor” or any other form of tensor. 

Appellant also argued on page 6 of the brief that the quantized version of the summed tensor 340, i.e., a "combined tensor" is subsequently encoded to a compressed code. Examiner disagree because, compressed code is encoded data and is not combined tensor. Sending encoded data to decoder is done by all encoder and is 

Appellant also argued on page 6 that paragraphs [0058-0061] of the specification describes that the bitplane composition module 235 of the decoder module 150 receives the compressed code to regenerate the quantized tensor. The feature synthesizer module 240 of the decoder module 150 generates the reconstructed input image from the quantized tensor. Because the feature synthesizer module 240 of the decoder module 150 is described as receiving a combined tensor to generate a reconstructed input image, the specification clearly describes the feature of a decoder portion that is coupled to receive a combined tensor to generate a reconstructed version of the input image. Examiner disagree because, paragraph [0058-0061] teach “quantized tensor” are reconstructed or generated at decoder which is different from “the decoder portion coupled to receive the combined tensor”. In these paragraph there is no evidence decoder received the combined tensor.

Specification [0077] teach FIG. 2A, additionally, the ACR module 160 receives the quantized tensor from the quantization module 215. Examiner notes this is an example of receiving “quantized tensor”. However, please note ACR module 160 as per Fig. 2A and paragraph [0030] is not part of part of decoder 150. 

So decoder is not receiving “quantized tensor” [“combined tensor”] instead “quantized tensor” are reconstructed or generated at decoder. Examiner did not find any support in the specification related to the limitation of the decoder portion coupled to receive [the any version of] “combined tensor” or “quantized tensor”.

Examiner also notes that scope of the limitation “the decoder portion coupled to receive the combined tensor” is different from “the decoder portion coupled to receive the encoded compressed code”, because in a subsequent claim limitation of claim 1 and claim 12, appellant claimed “compressing the combined tensor into a code” which is a separate limitation after claiming “the decoder portion coupled to receive the combined tensor”. 

Appellant argued in page 8:

	Appellant argued in page 8 that The Office Action's rejection is piecemeal treatment of the claims and is improper. Rather than considering the claim limitations as a whole, the Office Action cites to isolated disclosures that dissect the claimed invention 

	Office respectfully disagrees for the following reason:

Examiner disagrees on this characterization of rejection as piecemeal, because, in response to applicant's arguments against piecemeal treatment of the references individually, examiner notes all the reference are in the same field of invention and teach the invention as a whole. The invention would be obvious to one of ordinary skill in the art, from the combined teaching of the references. One cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

In KSR, the Supreme Court particularly emphasized "the need for caution in granting a patent based on the combination of elements found in the prior art,"Id. at 415, 82 USPQ2d at 1395, and discussed circumstances in which a patent might be determined to be obvious. According to MPEP 2141 combining prior art elements according to known methods to yield predictable results is considered reason for obviousness to combine.

In re McLaughlin, 443 F.2d 1392, 170 USPQ 209 (CCPA 1971).

Appellant argued in page 7-8:

	Appellant argued in page 7-8 that the Office Action's rejection is improper for "generating a combined tensor for the input image by applying an encoder portion including one or more layers of a neural network to the plurality of scaled images," where "the encoder portion is trained in conjunction with a decoder portion of an autoencoding process ... coupled to receive the combined tensor and generate a reconstructed version of the input image."	

	Office respectfully disagrees for the following reason:

Zhou Fig. 12:


    PNG
    media_image2.png
    399
    962
    media_image2.png
    Greyscale


Examiner disagrees, because Zhou teach generating a combined tensor (Zhou [0102] a joint tensor [combined tensor] representation is constructed to combine the original image and the partial segmentation results, and deep learning is used to learn a mapping between this joint tensor representation and the target segmentation mask. Zhou [0103] FIG. 12 a multi-channel representation is used to embed the input medical image 1202 and the previous segmentation results into a unified tensor, which is fed into a second deep CED 1206 to generate an updated segmentation mask) for the input image by applying an encoder portion (Zhou FIG. 10 [0090] a common encoder 1004 is shared across all output scales, while multiple decoders 1006a, 1006b, and 1006c are used, with a respective decoder 1006a, 1006b, and 1006c for each scale of the ground truth probability maps 1008a, 1008b, and 1008c) including one or more layers of a neural network (Zhou [0011] FIG. 5 illustrates a method of segmenting a target anatomical structure using a deep neural network. [0101] a deep convolutional encoder-decoder (CED) can be used to generate segmentation results with excellent myocardium continuity. [0104] FIG. 12, in both CED_Init 1204 and CE_PI 1206, five to the plurality of scaled images, (Zhou [0090] At step 906, a deep image-to-image network (DI2IN) is trained based on the multi-scale ground truths generated for the training images) where the encoder portion is trained (Zhou FIG. 9. [0103] FIG. 12 illustrates a framework for deep learning partial inference based medical image segmentation according to an embodiment of the present invention. In the first stage (Stage 1), a first deep convolutional encoder decoder (CED) 1204 is used to learn [autoencoder] a mapping from an input medical image 1202 (e.g., MR image) to a segmentation mask) in conjunction with a decoder portion of an autoencoding process, (According applicant specification meaning of autoencoding is encoder-decoder joint training/learning in neural network. Zhou [0090] for each training image the raw image 1002 is input to the encoder 1004. The output of the encoder 1004 is input to each of the decoders 1006a, 1006b, and 1006c, and each decoder 1006a, 1006b, and 1006c estimates a respective one of the multi-scale ground truth probability functions 1008a, 1008b, and 1008c. The loss function to be minimized in the training of the DI2IN 1000 can be considered as a summation of the loss from all of the decoders 1006a, 1006b, and 1006c) coupled to receive the combined tensor and generate a reconstructed version of the input image. (Zhou [0102] a joint tensor representation is constructed to combine the original image and the partial segmentation results, and deep learning is used to learn a mapping between this joint tensor representation and the target segmentation mask)

Appellant argued in page 8:

	Appellant argued in page 8 that the encoder 1004 and decoders 1006a, 1006b, 1006c of Zhou are not configured to encode an input image and decode a reconstructed version of the input image, nor does it disclose applying the encoder 1004 to a plurality of scaled images to generate a combined tensor that can be used to generate the
reconstructed image. Moreover, the decoders 1006a, 1006b, 1006c of Zhou are not configured to generate a reconstructed version of the input image, but output probability maps that surround the boundary of a target object in the input medical image.

	Office respectfully disagrees for the following reason:
Zhou Fig. 12

    PNG
    media_image3.png
    437
    545
    media_image3.png
    Greyscale


Examiner disagree, because encoder by definition will encode and decoder by definition will decode to reconstruct image. There is no specific way of encoding and reconstructing has not been claimed. Zhou Fig. 10 is showing raw image is going through encoder and decoder as multi-scale image and generates ground truth. These ground truths are representation of reconstructed image, because Zhou [0087] by constructing [reconstructed image] the ground truth output image for landmark detection this way, the landmark detection is treated as a regression problem while focusing around the target region. Please note Fig. 12 shows deep convolutional encoder-decoder (CED) of multiple stage.

Appellant argued in page 8-9:

	Appellant argued in page 8-9 that Wang do not teach,	generating intermediate tensors, where each intermediate tensor for a scaled image includes information extracted from the scale of the scaled image.

	Office respectfully disagrees for the following reason:

Examiner disagrees, because First of all Zhou FIG. 10, FIG. 12 shows multi scaled images based on the dimension/size of the image. Zhou teach combined tensor as explained above in Zhou [0102] [0103] as a joint/ unified tensor. Intermediate tensors are individual component of combined tensor. Zhou [0103] FIG. 12 a multi-channel 

However, different resolution of image also known in the art as image of different scale. Appellant claim did not specify meaning of multi scale in the claim language. Examiner has also cited Wang to reject following limitation - where each intermediate tensor for a scaled image includes information extracted from the scale of the scaled image; (Wang [0427] a neural network can have multi-stage upscaling (or other function) where an earlier layer upscales and then a later layer upscales, for example a middle layer upscales by 2× [different scales] and then the last layer upscales by 2× [different scales]. This type of “chained” approach can allow for neural networks to be trained in a long network (or “chain”) of functional layers. As such Wang teach encoding and decoding of multi scale [each layer of multiple layers upscaling by 2x factor] image of multiple layers of neural network encoder/decoder. Details are shown in Wang [0404] FIG. 7, there is shown an efficient sub-pixel convolutional neural network (ESPCN) 700 having a low-resolution input image 710 with two feature map extraction layers 720, 730 built with convolutional neural networks and a sub-pixel convolution layer 740 that aggregates the feature maps [720, 730, 740 intermediate tensor] from low-resolution space and builds the super resolution image 750 in a single step. [0405] as shown in FIG. 7, the high-resolution data is super resolved from the low-resolution feature maps 
f.sup.1(I.sup.LR;W.sub.1,b.sub.1)=φ(W.sub.1*I.sup.LR+b.sub.1),
f.sup.l(I.sup.LR;W.sub.1:l,b.sub.1:l)=φ(W.sub.1*f.sup.l-1(I.sup.LR)+b.sub.l),
where W.sub.l, b.sub.l, lϵ(1, L−1) are learnable network weights and biases respectively. W.sub.l is a 2D convolution tensor of size n.sub.l-1×n.sub.l×k.sub.L×k.sub.L, where n.sub.l is the number of features at level l, n.sub.0=C, and k.sub.L is the filter size and level l. The biases b.sub.l are vectors of length n.sub.l. The non-linearity function φ applies element-wise and is fixed. The last layer f.sup.l has to convert the low resolution feature maps to a high resolution image I.sup.SR). Wang FIG. 6 [0389] the reconstruction, or decoding, process in most embodiments involves applying the optimised super resolution convolutional neural network model, or reconstruction model, for each scene in order to restore the lower-resolution video to its original resolution having substantially the same quality as the original high-resolution video)

Appellant argued in page 9:

	Appellant argued in page 9 that Wang also fails to disclose an encoder portion and a decoder portion of an autoencoding process trained in conjunction with the encoder portion.	

	Office respectfully disagrees for the following reason:

Examiner notes that Wang was not cited for rejecting this limitation. As explained above in page 38-42 of this examiner answer document Zhou FIG. 9-10. FIG. 12 [0090] [0103] teach argued limitation. 

In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Appellant argued in page 9:

	Appellant argued in page 9 that the proposed modification of Zhou based on Wang is erroneous.	

Office respectfully disagrees for the following reason:

Examiner disagrees, because it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang in video/image technology as both are in the field of neural network describing encoding and decoding involving tensor. Zhou [0104] the strengths of the CED model include its great modeling capacity from a large annotated training image set and its built in regularization mechanism, both due to the deep hierarchical feature network representation and pooling-upsampling structures. Wang [0206] separating the visual data into a series of sections allows for the individual sections to be down-sampled thus reducing the visual data size, thereby allowing for lower quality sections to be transmitted as re-encoded visual data in the original or optionally a more optimal codec but at a lower resolution. So these are analogous art discussing same invention of neural network multi-scale image up-sampling/down-sampling. One would be motivated to do so, to incorporate generate a reconstructed version from scaled image. This will improve the coding efficiency by providing both multi dimension [multi scale] and multi resolution [multi scale] encoding/decoding with predictable results while combined feature set improve user experience.

In KSR, the Supreme Court particularly emphasized "the need for caution in granting a patent based on the combination of elements found in the prior art,"Id. at 415, 82 USPQ2d at 1395, and discussed circumstances in which a patent might be determined to be obvious. According to MPEP 2141 combining prior art elements 

Appellant argued in page 9:

	Appellant argued in page 9 that Mathieu was erroneously cited for the limitations of applying a parameterized function for each scaled image, mapping a plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors, and combining the plurality of tensors for the set of scaled images to generate a combined tensor for the input image, because 	Mathieu does not disclose applying an
encoder portion to a plurality of scaled images of an input image to generate a combined tensor that can be used to generate a reconstructed version of the input image.

	Office respectfully disagrees for the following reason:

Examiner disagrees, because examiner notes that Mathieu was not cited for rejecting limitation – “applying an encoder portion to a plurality of scaled images of an input image to generate a combined tensor that can be used to generate a reconstructed version of the input image”. As explained above in page 38-42 of this examiner answer document Zhou FIG. 9-10. FIG. 12 [0090] [0103] teach argued limitation. However Mathieu Fig. 2 show plurality of scaled images are combined to form combined tensor and Mathieu [0024] another method is to combine multiple scales 
[00006]ℒgdl(X,Y)=Lgdl(Y^,Y)=.Math.i,j.Math..Math..Math.Yi,j-Yi-1,j.Math.-.Math.Y^i,j-Y^i-1,j.Math..Math.α+.Math..Math.Yi,j-1-Yi,j.Math.-.Math.Y^i,j-1-Y^i,j.Math..Math.α (Equation 6), where α is an integer greater or equal to 1, and ∥ denotes the absolute value function. While a total variation regularization approach takes only the reconstructed frame in input, in particular embodiments, an approach may be taken in which the loss penalizes gradient differences between the prediction [from input image] and the true output [reconstructed image]. Examiner notes that Mathieu [0036] teach ground truth image Y are reconstructed image from the combined tensor of multiple scaled image from Mathieu Fig. 2.

In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Appellant argued in page 10:

	Appellant argued in page 10 that Mathieu also fails to disclose an encoder portion and a decoder portion of an autoencoding process trained in conjunction with the encoder portion.

	Office respectfully disagrees for the following reason:

Examiner notes that Mathieu was not cited for rejecting this limitation. As explained above in page 38-42 of this examiner answer document Zhou FIG. 9-10. FIG. 12 [0090] [0103] teach argued limitation.

In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Appellant argued in page 10:

	Appellant argued in page 10 that the proposed combination of Zhou, Wang, and Mathieu is also erroneous.

Office respectfully disagrees for the following reason:

Examiner disagrees, because it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang and Mathieu in video/image technology. Please note Mathieu [0019] convolutional neural network (CNN) to generate future frames given an input sequence for a multi-scale architecture, an adversarial training method, and an image gradient difference loss function. Mathieu [0024] another method is to combine multiple scales linearly as in the reconstruction process. So Mathieu’s disclosure is in the field of appellant’s claimed invention. Zhou Fig. 10 multi scale image but did not disclose parameterized function for each scaled image. Mathieu’s neural network has details of parameterized function. One would be motivated to do so, to incorporate parameterized function and mapping the plurality of intermediate tensors to a target output dimensionality to generate a plurality of tensors. This will improve the coding efficiency with predictable results with better image quality.

In KSR, the Supreme Court particularly emphasized "the need for caution in granting a patent based on the combination of elements found in the prior art,"Id. at 415, 82 USPQ2d at 1395, and discussed circumstances in which a patent might be determined to be obvious. According to MPEP 2141 combining prior art elements according to known methods to yield predictable results is considered reason for obviousness to combine.

Appellant argued in page 10:

	Appellant argued in page 10 that Suter was erroneously cited for the limitations of obtaining a plurality of scaled images, and for each scaled image, generating an intermediate tensor for the scaled image by applying a parameterized function for the scaled image. Suter discloses visual analysis of volume datasets using downsampling of factor matrices, not reconstruction of an input image.

	Office respectfully disagrees for the following reason:

Examiner disagrees, because all three reference (Zhou, Wang and Mathieu) discussed above shows reconstruction of input image as well as Suter page 151 col. 2 para 2 teach combine a TA [Tensor Approximation] based volume representation for data reduction and multiscale feature reconstruction with a hierarchical view-dependent variable resolution rendering, eventually supporting independent control of data reconstruction at different features scales as well as spatial resolutions. Suter page 153 col. 2 para 2 teach Fig. 7 shows the factor matrix averaging as used for a hierarchical tensor representation and its effects on the visual reconstruction. The top row uses standard scalar value averaging directly on the input volume, while in the middle we show the direct TA of these subsampled datasets. In the third row we demonstrate the tensor reconstruction based on the subsampled and averaged factor matrices as proposed. As can be seen the reconstructions are extremely close. Suter Figure 7: clearly shows reconstruction of input image.

Appellant argued in page 10:

	Appellant argued in page 10 that Suter fails to disclose or suggest applying an encoder portion to a plurality of scaled images to generate a combined tensor that used to generate a reconstructed version of the input image by applying a decoder portion trained in conjunction with an encoder portion.	

	Office respectfully disagrees for the following reason:

Examiner disagrees, because examiner notes that Suter was not cited for rejecting limitation – “applying an encoder portion to a plurality of scaled images of an input image to generate a combined tensor that can be used to generate a reconstructed version of the input image”. As explained above in page 38-42 of this examiner answer document Zhou FIG. 9-10. FIG. 12 [0090] [0103] teach argued limitation.

In response to applicant's arguments against the references individually, one cannot show nonobviousness by attacking references individually where the rejections are based on combinations of references.  See In re Keller, 642 F.2d 413, 208 USPQ 871 (CCPA 1981); In re Merck & Co., 800 F.2d 1091, 231 USPQ 375 (Fed. Cir. 1986).

Appellant argued in page 10:

	Appellant argued in page 10 that the proposed combination of Zhou, Wang, Mathieu, and Suter is erroneous.

	Office respectfully disagrees for the following reason:

Examiner disagrees, because it would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to modify Zhou, further incorporating Wang, Mathieu and Suter in video/camera technology. Please note the title of Suter Tensor Approximation Multiresolution … Suter page 152 col. 1 para 3 teach A [Tensor Approximation] framework, as previously used individually for multiscale volume visualization and for multiresolution volume rendering. As such Suter discloses the claimed subject matter. One would be motivated to do so, to incorporate for each scaled image, generating an intermediate tensor for the scaled image by applying the parameterized function for the scaled image thereto. This will improve the coding efficiency with predictable results.

In KSR, the Supreme Court particularly emphasized "the need for caution in granting a patent based on the combination of elements found in the prior art,"Id. at 415, 82 USPQ2d at 1395, and discussed circumstances in which a patent might be determined to be obvious. According to MPEP 2141 combining prior art elements 

Examiner notes that appellant’s arguments are not commensurate with the claim language supported by the specification. 


/NASIM N NIRJHAR/           Primary Examiner, Art Unit 2482                                                                                                                                                                                             
Conferees:
	/MATTHEW K KWAN/	Primary Examiner, Art Unit 2482                                                                                                                                                                                                        


	/CHRISTOPHER S KELLEY/             Supervisory Patent Examiner, Art Unit 2482                                                                                                                                                                                           



Requirement to pay appeal forwarding fee.  
In order to avoid dismissal of the instant appeal in any application or ex parte reexamination proceeding, 37 CFR 41.45 requires payment of an appeal forwarding fee within the time permitted by 37 CFR 41.45(a), unless appellant had timely paid the fee for filing a brief required by 37 CFR 41.20(b) in effect on March 18, 2013.