DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Status of Claims
The following claims are pending in this office action: 1-20
The following claims are amended: 1-19
The following claims are new: None
The following claims are cancelled: None
The following claims are rejected: 1-20
Response to Arguments
Applicant’s arguments filed amendments on 08/13/2021 to address the claim objections. In response to the Applicant’s amendments, the claim objections have been withdrawn.
Applicant’s arguments filed amendments on 08/13/2021 to address U.S.C. 112(b) rejection. In response to the Applicant’s amendments, the U.S.C. 112(b) rejection has been withdrawn.
Applicant’s arguments filed amendments on 08/13/2021 to address the 35 U.S.C. 102 and 103 rejection. Applicant’s arguments have been fully considered but they are not persuasive. Applicant argues “Claim 8, as amended, recites the limitations of "generating, with the one or more physical computer processors, a conditioned network by training the initial network using the training content, the conditioned network comprising the one or more See Office Action at pages 4-5. Based on these claim mappings, in order to teach the above limitations of amended claim 8, Theis would have to disclose that that model is trained to receive target content and generated encoded target content comprising a quantized latent space. Importantly, Theis contains no such teachings. As discussed above, Theis only discloses that the selected model and the trained model encode an image by convolving or filtering different subsets of pixels within the image, adding the different convolved pixel values, and rounding the sum of the convolved pixel values. See Theis at col. 8, lines 34-56. The encoded image, therefore, comprises a set of pixel values rather than a quantized latent space. Notably, Theis is silent with respect to a model that generates a latent space. In view of at least these distinctions, Applicant submits that Theis cannot be properly interpreted as teaching or suggesting the above limitations of amended claim 8. A careful review of the other references cited by the Examiner shows that those references also fail to teach or suggest the above limitations of amended claim 8.” Examiner respectfully disagrees. Theis teaches training a conditioned network to receive target content and generate encoded target content as noted in the abstract specifically “ The encoder generates compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using a trained .
Applicant also argues “Claim 1, as amended, recites the limitations of "apply, with the one or more 
See Office Action at pages 8-9. Based on these claim mappings, in order to teach the above limitations of amended claim 1, He would have to disclose the idea of quantizing the structured latent state based on a plurality of distributions corresponding to the structured latent state. Importantly, He contains no such teachings. As discussed above, He only discloses that the structured latent state includes a set of hierarchical approximate posterior distributions. See He at page 7. Notably, however, He does not disclose quantizing the structured latent state based on any distributions included in the set of hierarchical approximate posterior distributions. In view of at least these distinctions, Applicant submits that He cannot be properly interpreted as teaching or suggesting the above limitations of amended claim A careful review of the other references cited by the Examiner shows that those references also fail to teach or suggest the above limitations of amended claim 1.” Examiner respectfully disagrees. Theis discloses comprising quantization and generating encoded target content (Col. 4, line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly” and Fig 6A/6B) and He discloses a latent space comprising nd Para. and Fig. 2, and Holistic Attribute Control section). Examiner respectfully asserts the combination of Theis and He would teach the limitation of “apply, with the one or more physical computer processors, the conditioned network to the target content to generate a latent space of the target content comprising one or more local variables, one or more global variables, and a plurality of distributions corresponding to the latent space; and quantize the latent space based on the plurality of distributions corresponding to the latent space to generate encoded target content” as quantization of the latent space would sufficiently produce a latent space that “provide probabilistic methods of encoding/decoding full frames into/from a compact latent state” (He, Page 4, Contributions section). Further, Quantization in compression is used to reduce number of bits needed to store a value where quantization may use rounding to achieve its intended function. Quantization of the latent space would result in a further compressed and compact representation of the latent space. Further, it is known in the art to perform quantization before encoding (see Fig. 4 of Baile, et al. (Variational image compression with a scale hyperprior))
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 3-5, 8, 14, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. Patent No. US 10623775 B1 to Theis, et al. (hereinafter, “Theis”), in view of Probabilistic Video Generation using Holistic Attribute Control to He, et al. (hereinafter, “He”)
As per claim 1, Theis teaches a system configured for compressing target content, the system comprising:
non-transient electronic storage; (Theis, Fig. 6b discloses at least one memory 660)
one or more physical computer processors configured by machine-readable instructions to: (Theis, Col. 2, Line 45 discloses “In yet another general aspect, a non-transitory computer readable medium includes code segments that when executed by a processor cause the processor to perform steps…”)
obtain, from the non-transient electronic storage, the target content comprising one or more frames, wherein a given frame comprises one or more features;  (Theis, Col. 4, Line 25 discloses “The encoder 110 can be configured to receive video data 5…” (Video data consists of multiples frames where those frames contain features. Video data is stored in an electronic storage))
(Theis, Col. 5, Line 60 discloses “As shown in FIG. 2, in step S205 a model for encoding an image or frame of a video is selected.” (Models are stored within a storage or memory))
the conditioned network having been trained by training an initial network using training content,  (Theis, Col. 7, Line 17-27 discloses “In an example implementation, a video is streamed from device at a live event to a plurality of remote devices using the techniques described herein. For example, a sporting event (e.g., baseball, football, swimming, and/or the like) is streamed from a device located at the venue holding the sporting event to a plurality of subscriber devices. In this example implementation, the model is trained in real-time. As a result, the quality of the decoded streaming video should improve during the course of the sporting event. In addition, the trained model can be used (or selected for use during) a future sporting event.” And Abstract discloses “The encoder generates compressed video data using a lossy compression algorithm, the lossy compression algorithm being implemented using a trained neural network with at least one convolution…” (The models which are selected for future events are stored in memory for future use thus the memory contains trained models. Additionally the system consists of a trained neural neural network))
wherein the conditioned network comprises one or more encoders, one or more quantizers, and one or more decoders, (Theis, Fig 1. Discloses Encoder 110, and Decoder 120, and Col. 4, Line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly.” (Network consisting of encoder, decoder, and quantizer))
(Theis, Col. 6,  Line 59-65 discloses “In step S230 the model is trained based on the decoded image or frame of the video. For example, the model can be trained using at least one Independent Gaussian scale mixture (GSM) as described below with regard to FIG. 4. The GSM can use at least one error value associated with decoding the image or frame of the video to train the neural networks on which the model is based.” (Training content consists of frames which consist of features))
apply, with the one or more physical computer processors, the conditioned network to the target content to [[generate a latent space of the target content comprising one or more local variables and one or more global variables, and a plurality of distributions corresponding to the latent space]] (Theis, Fig. 6A discloses applying an encoder to the target content which produces a compressed format)
quantize [[the latent space based on the plurality of distributions corresponding to the latent space]] to generate encoded target content (Theis, Col. 4, line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly (Note that He additionally produces encoded target content as seen in Fig 2 involving an encoder))
	Theis fails to explicitly teach:
generate a latent space of the target content comprising one or more local variables and one or more global variables, and a plurality of distributions corresponding to the latent space
the latent space based on the plurality of distributions corresponding to the latent space
However, He (He addresses the issue of probabilistic video generation) teaches:
(He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables. It is to be noted that He also discloses the limitations taught by Theis above. Theis discloses the conditioned network above))
the latent space based on the plurality of distributions corresponding to the latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables. It is to be noted that He also discloses the limitations taught by Theis above. Theis discloses the conditioned network above))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the conditioned network for image and video compression as disclosed by Theis to use the generation of a latent space consisting of local and global variables as disclosed by He. The combination would have been obvious because a person of ordinary skill in the art would be motivated to “provide probabilistic 

As per claim 3, the combination of Theis, and He as shown above teaches the system of claim 1, Theis further teaches wherein the one or more physical computer processors are further configured by machine-readable instructions tos:
apply, with the one or more physical computer processors, a plurality of convolutional layers to generate convolved target content; (Theis, Fig. 3A discloses applying multiple convolutional layers to a target content)
	He further teaches:
apply, with the one or more physical computer processors, a global model to the [[convolved target content]] to generate the one or more global variables; (He, Fig. 2 discloses a structured latent representation of a variational autoencoder (the variational autoencoder being the global model which the paper uses as the proposed model which generates the latent space consisting of global variables. Convolved target content is disclosed by Theis above))
and apply, with the one or more physical computer processors, a multilayer perceptron model to [[the convolved target content]] to generate the one or more local variables (He, Fig. 2b discloses a neural network within the structured latent space that is applied thus generating local variables (A multilayer perceptron model is a vanilla neural network. Convolved target content disclosed by Theis above))


As per claim 4, the combination of Theis, and He as shown above teaches the system of claim 3, He further teaches:
wherein the global model comprises one or more of a long short-term memory model  or a Kalman filter (He, Fig 2A discloses the global model comprising of a long short term memory model)
	Same motivation to combine Theis and He as claim 1

As per claim 5, the combination of Theis, and He as shown above teaches the system of claim 1, Theis further teaches wherein applying the conditioned network further comprises:
 [[latent space]] using [[the plurality of distributions]] (Theis, Fig. 6A discloses a video encoder system which is to encode the quantized latent space which was disclosed earlier above)
and decoding, with the one or more physical computer processors, encoded quantized [[latent space]]. (Theis, Fig. 6B discloses video decoder system which is to decode the encoded latent space as shown above)
	He further teaches:
	Latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables))
the plurality of distributions (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables and plurality of distributions))
Same motivation to combine Theis and He as claim 1

As per claim 8, Theis teaches a computer-implemented method for training an initial network to simultaneously learn how to refine a latent space using training content and how to refine a plurality of distributions of the latent space using the training content (The preamble of the claim is stating intended use of the invention),
 the method being implemented in a computer system that comprises non-transient electronic storage and one or more physical computer processors, comprising (Theis, Fig. 6b discloses at least one memory 660, and Col. 2, Line 45 discloses “In yet another general aspect, a non-transitory computer readable medium includes code segments that when executed by a processor cause the processor to perform steps…” (Electronic storage being a memory)):
obtaining, from the non-transient electronic storage, training content comprising one or more training frames, wherein a given training frame comprises one or more training features; (Theis, “In step S230 the model is trained based on the decoded image or frame of the video. For example, the model can be trained using at least one Independent Gaussian scale mixture (GSM) as described below with regard to FIG. 4. The GSM can use at least one error value associated with decoding the image or frame of the video to train the neural networks on which the model is based.” (Obtaining training content, training content consists of frames which consist of features))
obtaining, from the non-transient electronic storage, the initial network, the initial network comprising one or more encoders, one or more quantizers, and one or more decoders; (Theis, Fig 1. Discloses Encoder 110, and Decoder 120, and Col. 4 Line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly.” And Fig. 2 selects an initial model (Network consisting of encoder, decoder, and quantizer))
and generating, with the one or more physical computer processors, a conditioned network by training the initial network using the training content, the conditioned network comprising the one or more encoders, the one or more quantizers, and the one or more decoders; wherein the conditioned network is trained to receive target content and generate encoded target content comprising a quantized [[latent space]] (Theis, “In step S230 the model is trained based on the decoded image or frame of the video. For example, the model can be trained using at least one Independent Gaussian scale mixture (GSM) as described below with regard to FIG. 4” and Fig 1. Discloses Encoder 110, and Decoder 120, and Col. 4, line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly.”)
	Theis fails to explicitly teach:
	latent space
	However, He teaches:
latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables))
	Same motivation to combine Theis and He as claim 1

	As per claim 14, Theis teaches a computer-implemented method for compressing target content, the method being implemented in a computer system that comprises non-transient electronic storage and one or more physical computer processors, comprising:
obtaining, from the non-transient electronic storage, the target content comprising one or more frames, wherein a given frame comprises one or more features (Theis, Col. 4, Line 25 discloses “The encoder 110 can be configured to receive video data 5…” (Video data consists of multiples frames where those frames contain features. Video data is stored in an electronic storage))
quantizing [[the latent space based on the plurality of distributions corresponding to the latent space]] (Theis, Col. 4, line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly (Note that He additionally produces encoded target content as seen in Fig 2 involving an encoder))
	Theis fails to explicitly teach:
encoding, with the one or more physical computer processors, the target content to generate one or more local variables and one or more global variables; 
and generating, with the one or more physical computer processors, a latent space, the latent space comprising the one or more local variables and the one or more global variables, wherein the one or more local variables are based on the one or more features in the given frame, and wherein the one or more global variables are based on one or more features common to a plurality frames of the target content.

the latent space based on the plurality of distributions corresponding to the latent space
	However, He teaches:
	encoding, with the one or more physical computer processors, the target content to generate one or more local variables and one or more global variables (He, Fig. 2A discloses an encoder generating a latent space (Latent space comprises global and local variables))
and generating, with the one or more physical computer processors, a latent space, the latent space comprising the one or more local variables and the one or more global variables, wherein the one or more local variables are based on the one or more features in the given frame, and wherein the one or more global variables are based on one or more features common to a plurality frames of the target content (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (The latent space is comprised of global and local variables. The global and local variables encode visual input data within the latent space whereby the global and local variables within the latent space contain information relating to features))
generating, with the one or more physical computer processors, a plurality of distributions corresponding to the latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables))
the latent space based on the plurality of distributions corresponding to the latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables))
Same motivation to combine Theis and He as claim 1

As per claim 16, the combination of Theis, and He as shown above teaches the computer-implemented method of claim 14, Theis further teaches wherein encoding the target content comprises:
	applying, with the one or more physical computer processors, a plurality of  convolutional layers to the target content to generate convolved target content (Theis, Fig. 3A discloses applying multiple convolutional layers to a target content)
	He further teaches:
	applying, with the one or more physical computer processors, a long short-term memory model to [[convolved target content]] to generate the one or more global variables (He, Fig. 2 discloses a structured latent representation of a variational autoencoder (the variational autoencoder which consists of a long term short term memory model. Latent space is generated comprising of global variables (Convolved target content is disclosed by Theis above))
and applying, with the one or more physical computer processors, a multilayer perceptron model to [[the convolved target]] content to generate the one or more local variables (He, Fig. 2b discloses a neural network within the structured latent space that is applied thus generating local variables (A multilayer perceptron model is a vanilla neural network. Convolved target content disclosed by Theis above))
Same motivation to combine Theis and He as claim 3

Claims 2, 9-11, 15, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Theis, in view of He, and further in view of U.S. Pub. No. US 20170230675 A1 to Wierstra, et al. (hereinafter, “Wierstra”)
As per claim 2, the combination of Theis and He as shown above teaches the computer-implemented method of claim 1, Theis further teaches wherein applying the conditioned network comprises:
He further teaches:
wherein the one or more local variables are based on the one or more features in the given frame, wherein the one or more global variables are based on one or more features common to a plurality of frames of the target content; (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (The latent space is comprised of global and local variables. The global and local variables encode visual input data within the latent space whereby the global and local variables within the latent space contain information relating to features)
	the latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
	Same motivation to combine Theis and He as claim 1
The combination of Theis and He fails to explicitly teach:
wherein the plurality of distributions indicate a likelihood of values for [[the one or more local variables and the one or more global variables]]
However, Wierstra (Wierstra addresses the issue of compressing images using neural networks) teaches:
wherein the plurality of distributions indicate a likelihood of values for [[the one or more local variables and the one or more global variables]] (Wierstra, Para. [0021] discloses “In particular, the encoder neural network 110 has been trained as the encoder neural network of a variational auto encoder and is therefore configured to receive the image 102 and to process the image 102 to generate outputs defining values of a number of latent variables that each represent a different feature of the image 102. In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution, from which the latent variables are sampled. For example, in some of these implementations, a linear transformation can be applied to the outputs to generate the parameters of the distribution.” (Distributions generated which are sampled from latent variables which are in a latent space. The distributions indicate values of the variables. He discloses global and local variables above))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Theis as modified to use the generation of multiple distribution as disclosed by Wierstra. The combination would have been obvious because a person of skill in the art would be motivated to “generate outputs defining values of a number of latent variables that each represent a different feature of the image” (Wuerstra, Para. [0021]) where the distributions can be further sampled from indicating values of the number of latent variables.

As per claim 9, the combination of Theis and He as shown above teaches the computer-implemented method of claim 8, Theis further teaches further comprising:
obtaining, from the non-transient electronic storage, the target content comprising one or more frames, wherein a given frame comprises one or more features; (Theis, Col. 4, Line 25 discloses “The encoder 110 can be configured to receive video data 5…” (Video data consists of multiples frames where those frames contain features. Video data is stored in an electronic storage))
and quantizing, with the one or more physical computer processors, [[the latent space based on the plurality of distributions]] using the conditioned network (Theis, Col. 4, line 62 discloses “The quantized output of the encoder (e.g., compressed video data 10) is the code used to represent an image and is stored losslessly” (Latent space and multiple distributions disclosed below))
	He further teaches:
encoding, with the one or more physical computer processors, the target content to generate one or more local variables and one or more global variables using [[the conditioned network]] (He, Fig. 2A discloses an encoder generating a latent space (Latent space comprises global and local variables. Conditioned network disclosed by Theis earlier))
generating, with the one or more physical computer processors, the latent space using [[the conditioned network]], the latent space comprising the one or more local variables and the one or more global variables, wherein the one or more local variables are based on the one or more features in the given frame, and wherein the one or more global variables are based on one or more features common to a plurality of frames of the target content; (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (The latent space is comprised of global and local variables. The global and local variables encode visual input data within the latent space whereby the global and local variables within the latent space contain information relating to features. Theis discloses the conditioned network above. It is to be noted that He additionally contains a network that can encode input data))
the latent space based on the plurality of distributions (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
[[the conditioned network]] (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables and plurality of distributions))
	Same motivation to combine Theis and He as claim 8
The combination of Theis and He fails to explicitly teach:
wherein the plurality of distributions indicate a likelihoods of values for [[the one or more local variables and the one or more global variables]]
	However, Wierstra teaches:
wherein the plurality of distributions indicate a likelihoods of values for [[the one or more local variables and the one or more global variables]] (Wierstra, Para. [0021] discloses “In particular, the encoder neural network 110 has been trained as the encoder neural network of a variational auto encoder and is therefore configured to receive the image 102 and to process the image 102 to generate outputs defining values of a number of latent variables that each represent a different feature of the image 102. In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution, from which the latent variables are sampled. For example, in some of these implementations, a linear transformation can be applied to the outputs to generate the parameters of the distribution.” (Distributions generated which are sampled from latent variables which are in a latent space. The distributions indicate values of the variables. He discloses global and local variables above))
	Same motivation to combine Theis and Wierstra as claim 2

	As per claim 10, the combination of Theis, He, and Wierstra as shown above teaches the computer-implemented method of claim 9, Theis further teaches wherein encoding the target content comprises:
	applying, with the one or more physical computer processors, a plurality of convolutional layers to the target content to generate convolved target content (Theis, Fig. 3A discloses applying multiple convolutional layers to a target content)
	He further teaches:
	applying, with the one or more physical computer processors, a long short-term memory model to [[convolved target content]] to generate the one or more global variables (He, Fig. 2 discloses a structured latent representation of a variational autoencoder (the variational autoencoder which consists of a long term short term memory model. Latent space is generated comprising of global variables (Convolved target content is disclosed by Theis above))
and applying, with the one or more physical computer processors, a multilayer perceptron model to [[the convolved target]] content to generate the one or more local variables (He, Fig. 2b discloses a neural network within the structured latent space that is applied thus generating local variables (A multilayer perceptron model is a vanilla neural network. Convolved target content disclosed by Theis above))
Same motivation to combine Theis and He as claim 3

As per claim 11, the combination of Theis, Wierstra, and He as shown above teaches the computer-implemented method of claim 9, Theis further teaches:
encoding, with the one or more physical computer processors, the quantized [[latent space]]  (Theis, Fig. 6A discloses a video encoder system which is to encode the quantized latent space which was disclosed earlier above)
and decoding, with the one or more physical computer processors, encoded quantized [[latent space]]. (Theis, Fig. 6B discloses video decoder system which is to decode the encoded latent space as shown above)
He further teaches:
latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” and Page 7 discloses “Based on these observations, we propose the following structured latent space, which comprises a set of hierarchical approximate posterior distributions…” (Latent space comprising of global and local variables))
Same motivation to combine Theis and He as claim 1

As per claim 15, the combination of Theis, and He, as shown above teaches the computer-implemented method of claim 14, the combination of Theis and he fails to explicitly teach:
wherein the plurality of distributions indicate a likelihood of values for [[the one or more local variables and the one or more global variables]] (He discloses the local and global variables earlier as seen in claim 1)
However, Wierstra teaches:
wherein the plurality of distributions indicate a likelihood of values for [[the one or more local variables and the one or more global variables]] (Wierstra, Para. [0021] discloses “In particular, the encoder neural network 110 has been trained as the encoder neural network of a variational auto encoder and is therefore configured to receive the image 102 and to process the image 102 to generate outputs defining values of a number of latent variables that each represent a different feature of the image 102. In some implementations, the outputs of the encoder neural network 110 define parameters, e.g., mean or log variance or both, of distributions, e.g., a Gaussian distribution, from which the latent variables are sampled. For example, in some of these implementations, a linear transformation can be applied to the outputs to generate the parameters of the distribution.” (Distributions generated which are sampled from latent variables which are in a latent space. The distributions indicate values of the variables))
Same motivation to combine Theis and Wierstra as claim 9

As per claim 17, the combination of Theis, He and Wierstra as shown above teaches the computer-implemented method of claim 15, Theis further teaches:
encoding, with the one or more physical computer processors, quantized [[latent space]] (Theis, Fig. 6A discloses a video encoder system which is to encode the quantized latent space which was disclosed earlier above)
and decoding, with the one or more physical computer processors, encoded quantized [[latent space]] . (Theis, Fig. 6B discloses video decoder system which is to decode the encoded latent space as shown above)
	He further teaches:
	latent space (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
	Same motivation to combine Theis and He as claim 1

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Theis, in view of He, and further in view of An Experience in Image Compression Using Neural Networks to Vilovic (hereinafter, “Vilovic”)
As per claim 6, the combination of Theis, and He, as shown above teaches the system of claim 5, Theis further teaches wherein decoding the encoded quantized latent space comprises:
(Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Decoder 120 being a substitute for entropy decoding which is to decode the encoded latent space as shown above earlier))
combining, with the one or more physical computer processors, entropy decoded latent space with [[a multilayer perceptron model]] (Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Entry decoded latent space shown above))
and applying, with the one or more physical computer processors, a plurality of  deconvolutions to a combination of the entropy decoded latent space with [[the multilayer perceptron model]] (Theis, Fig. 3B discloses the decoder 120 using multiple convolutions (The convolutions in the decoder act as deconvolution for an input)
The combination of Theis, and He fails to explicitly teach:
a/the multilayer perceptron model
However, Vilovic (Vilovic addresses the issue of using multilayer perceptrons for image compression) teaches:
a/the multilayer perceptron model (Vilovic, Abstract discloses “In this paper, a direct solution method is used for image compression using the neural networks. An experience of using multilayer perceptron for image compression is presented. The multilayer perceptron is used for transform coding of the image.”)
.

Claims 12 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Theis, in view of He, further in view of Wierstra, and further in view of Vilovic
As per claim 12, the combination of Theis, He and Wierstra as shown above teaches the computer implemented method of claim 11, Theis further teaches wherein decoding the encoded quantized latent space comprises:
entropy decoding, with the one or more physical computer processors, the encoded latent space; (Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Decoder 120 being a substitute for entropy decoding which is to decode the encoded latent space as shown above earlier))
combining, with the one or more physical computer processors, entropy decoded latent space with [[a multilayer perceptron model]] (Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Entry decoded latent space shown above))
[[the multilayer perceptron model]] (Theis, Fig. 3B discloses the decoder 120 using multiple convolutions (The convolutions in the decoder act as deconvolution for an input)
The combination of Theis, He, and Wierstra fails to explicitly teach:
a/the multilayer perceptron model
However, Vilovic teaches:
a/the multilayer perceptron model (Vilovic, Abstract discloses “In this paper, a direct solution method is used for image compression using the neural networks. An experience of using multilayer perceptron for image compression is presented. The multilayer perceptron is used for transform coding of the image.”)
Same motivation to combine Theis, and Vilovic as claim 6
As per claim 18, the combination of Theis, He, and Wierstra as shown above teaches the computer implemented method of claim 17, Theis further teaches wherein decoding the quantized latent space comprises:
entropy decoding, with the one or more physical computer processors, the encoded latent space; (Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Decoder 120 being a substitute for entropy decoding which is to decode the encoded latent space as shown above earlier))
combining, with the one or more physical computer processors, entropy decoded latent space with [[a multilayer perceptron model]] (Theis, “For example, the decoder 120 can replace inverse transformation (e.g., arithmetic and/or discrete cosign transform (DCT)), inverse quantization and/or entropy decoding as typically performed by a video decoder.” (Entry decoded latent space shown above))
and applying, with the one or more physical computer processors, multiple deconvolutions to a combination of the entropy decoded latent space with [[the multilayer perceptron model]] (Theis, Fig. 3B discloses the decoder 120 using multiple convolutions (The convolutions in the decoder act as deconvolution for an input)
The combination of Theis, He, and Wierstra fails to explicitly teach:
a/the multilayer perceptron model
However, Vilovic teaches:
a/the multilayer perceptron model (Vilovic, Abstract discloses “In this paper, a direct solution method is used for image compression using the neural networks. An experience of using multilayer perceptron for image compression is presented. The multilayer perceptron is used for transform coding of the image.”)
Same motivation to combine Theis, He, Wierstra, and Vilovic as claim 6

Claims 7, 13, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Theis, in view of He, further in view of Wierstra, and further in view of Lossy Image Compression With Compressive Autoencoders to Shi, et al. (hereinafter, “Shi”)
As per claim 7, the combination of Theis, and He, as shown above teaches the system of claim 1, He further teaches:
(He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
Wierstra further teaches: 
wherein the plurality of  distributions corresponding to the latent space [[comprise noise]], and are centered around the means of [[the one or more global variables and the one or more local variables]] (Wierstra, Para. [0051] discloses “To discretize a given compression latent variable, the system discretizes the latent variable to approximately the width of the distribution from which the latent variable is sampled, and then assigns the latent variable the discrete value that is closest to the mean of the distribution…This is, unlike conventional variational auto encoders, where a value for the latent variable would be sampled from the distribution, a discrete value is assigned to the latent variable based on the mean of the distribution.” (Distributions being centered around the means of the variables. He discloses global and local variables within a distribution as seen above))
Same motivation to combine Theis and Wierstra as claim 2
The combination of Theis, He, and Wierstra fails to explicitly teach:
comprise noise
However, Shi (Shi addresses the issue of lossy image compression) teaches:
comprise noise  (Shi, Fig. 2 discloses noise being added to the Gaussian scale mixture (GSM) whereby a GSM is a model for the distributions of quantized coefficients)


As per claim 13, the combination of Theis, and He, as shown above teaches the computer implemented method of claim 9, He further teaches:
the one or more global variables and the one or more local variables (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
 Wierstra further teaches: 
wherein the plurality of  distributions corresponding to the latent space [[comprise noise]], and are centered around the means of [[the one or more global variables and the one or more local variables]] (Wierstra, Para. [0051] discloses “To discretize a given compression latent variable, the system discretizes the latent variable to approximately the width of the distribution from which the latent variable is sampled, and then assigns the latent variable the discrete value that is closest to the mean of the distribution…This is, unlike conventional variational auto encoders, where a value for the latent variable would be sampled from the distribution, a discrete value is assigned to the latent variable based on the mean of the distribution.” (Distributions being centered around the means of the variables))
The combination of Theis, He, and Wierstra fails to explicitly teach:
comprise noise
However, Shi teaches:
comprise noise  (Shi, Fig. 2 discloses noise being added to the Gaussian scale mixture (GSM) whereby a GSM is a model for the distributions of quantized coefficients)
	Same motivation to combine Theis, He, Wiersta, and Shi as claim 9

As per claim 19, the combination of Theis, and He as shown above teaches the computer implemented method of claim 14, He further teaches:
the one or more global variables and the one or more local variables (He, Fig. 2, and Holistic Attribute Control section discloses “Since the VAE encoder φenc already maps the input images x (1:T) to a set of latent features φenc x (1:T)  , we infer the attributes ai from those representation…” (Latent space comprising of global and local variables))
Wierstra further teaches: 
wherein the plurality of distributions corresponding to the latent space are centered around the means of [[the one or more global variables and the one or more local variables]] and wherein [[random noise is added to the plurality of distributions]] (Wierstra, Para. [0051] discloses “To discretize a given compression latent variable, the system discretizes the latent variable to approximately the width of the distribution from which the latent variable is sampled, and then assigns the latent variable the discrete value that is closest to the mean of the distribution…This is, unlike conventional variational auto encoders, where a value for the latent variable would be sampled from the distribution, a discrete value is assigned to the latent variable based on the mean of the distribution.” (Distributions being centered around the means of the variables))
The combination of Theis, and He fails to explicitly teach:
random noise is added to the multiple distributions
However, Shi teaches:
random noise is added to the multiple distributions  (Shi, Fig. 2 discloses noise being added to the Gaussian scale mixture (GSM) whereby a GSM is a model for the distributions of quantized coefficients)
	Same motivation to combine Theis, He, Wiersta, and Shi as claim 9

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Theis, in view of He, and further in view of Variational Image Compression With A Scale Hyperprior to Balle, et al. (hereinafter, “Balle”)
	As per claim 20, the combination of Theis and He as shown above teaches the computer-implemented method of claim 14, the combination of Theis and He fails to explicitly teach:
	wherein the latent space comprises a global density model corresponding to the one or more global variables and a local density model corresponding to the one or more local variables. 
However, Balle (Balle addresses the issue of image compression) teaches:

    PNG
    media_image1.png
    106
    896
    media_image1.png
    Greyscale
wherein the latent space comprises a global density model corresponding to the one or more global variables (Balle, Eq. 6 discloses 									  (Eq. 6 being a density model for modeling prior))

    PNG
    media_image2.png
    103
    942
    media_image2.png
    Greyscale
and a local density model corresponding to the one or more local variables. (Balle, Eq. 9 discloses                                                                                                                       (Eq. 9 being a density model))
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Theis as modified to use the density models as disclosed by Balle. The combination would have been obvious because a person of ordinary skill in the art would be motivated to fully encapsulate encoded information within the global and local variables by use of separate density models and “…capture the spatial dependencies…” (Balle, Introduction of a Scale Hyperprior section)
Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HAMZA RAZZAQ MUGHAL whose telephone number is 571-272-8833. The examiner can normally be reached on M-TR from 7:30 to 5:00.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, ALEXEY SHMATOV, can be reached at telephone number 571-270-3428. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAIR. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://portal.uspto.gov/external/portal. Should you have questions about access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

/H.R.M./Examiner, Art Unit 2123