Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on February 4, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
The information disclosure statement (IDS) submitted on December 12, 2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1-7, 10-13, 20-23, and 25-29 are rejected under 35 U.S.C. 101.
Regarding claim 1, the claimed invention is directed to an abstract idea without significantly more. The claim recites generating a vector by encoding an input, and generating a response by decoding the vector, which is a mental process. This judicial exception is not integrated into a practical application because the collecting of input data does not add a meaningful limitation as it is an extra-solution activity. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because associating the vector and response with a region is merely flagging them to an area of storage.
Regarding claim 2, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 1, where the vector has multiple dimensions 
Regarding claim 3, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 1, where the regions correspond to responses. This judicial exception is not integrated into a practical application because the claim does not provide a new application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the region being able to correspond to multiple responses does not modify its function.
Regarding claim 4, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 3, where control inputs that help to generate the variable partition the regions. This judicial exception is not integrated into a practical application because the control inputs only participate in the method without adding new functionality. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the control inputs serve purposes that were already part of the method.
Regarding claim 5, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 1, where generating the vector includes generating a variable which corresponds to the vector. This judicial exception is not integrated into a practical application because the claim merely adds a step to generating the vector, which still only results in generating a vector. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because generating a variable that corresponds to the generated vector still results in generating a vector.

Regarding claim 7, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 4, where the region of the vector is determined by a control input. This judicial exception is not integrated into a practical application because selecting a control input to determine the vector's region does not add a meaningful limitation, as it is an extra-solution activity. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the use of a probability distribution does not change the outcome of generating a vector.
Regarding claim 10, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 3, where the input is a user utterance and the responses are different responses to the utterance. This judicial exception is not integrated into a practical application because the type of data the input comprises does not change that obtaining the input is an extra-solution activity that does not add a meaningful limitation. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the responses being different responses to the input does not change that they are still responses, and thus does not add a meaningful limitation.
Regarding claim 11, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 1, where the encoding is done using a neural 
Regarding claim 12, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 1, where the decoding is done using a neural network. This judicial exception is not integrated into a practical application because the use of a generic neural network does not add a meaningful limitation as it amounts to simply implementing the mental process on a computer. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the fundamental aspects of the process are the same.
Regarding claim 13, the claimed invention is directed to an abstract idea without significantly more. The claim recites instructions that, when executed, generate a vector by encoding an input, and generate a response by decoding the vector, which is a mental process. This judicial exception is not integrated into a practical application because the storage medium holding the instructions is a generic computer component that does not add a meaningful limitation to the mental process because it amounts to implementing the idea on a computer. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because obtaining an input is an extra-solution activity, and partitioning the regions corresponding to responses does not modify their function.
Regarding claim 20, the claimed invention is directed to an abstract idea without significantly more. The claim recites a processor that generates a vector by encoding an input, and generates a response by decoding the vector, which is a mental process. This judicial exception is not integrated into 
Regarding claim 21, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 20, where the vector has multiple dimensions comprising variables to generate the response. This judicial exception is not integrated into a practical application because the claim does not provide a new application. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because even though the vector has multiple variables, it still only serves to generate a response as before.
Regarding claim 22, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 20, where control inputs that help to generate the variable partition the regions. This judicial exception is not integrated into a practical application because the control inputs only participate in the method without adding new functionality. The claim does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the control inputs serve purposes that were already part of the method.
Regarding claim 23, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 20, where generating the vector includes generating a variable which corresponds to the vector. This judicial exception is not integrated into a practical application because the claim merely adds a step to generating the vector, which still only results in generating a vector. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because generating a variable that corresponds to the generated vector still results in generating a vector.

Regarding claim 26, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 25, where the processor generates the vector based on a control input, which is based on a latent variable produced by encoding the input. This judicial exception is not integrated into a practical application because the processor is a generic computer component that does not add a meaningful limitation to the mental process because it amounts to implementing the idea on a computer. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because generating a variable that corresponds to a control input, which corresponds to the generated vector still results in generating a vector.
Regarding claim 27, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 26, where the control input randomly corresponds to any region. This judicial exception is not integrated into a practical application because the claim does not provide a new application. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the control input corresponding to a random region does not modify its function.

Regarding claim 29, the claimed invention is directed to an abstract idea without significantly more. The claim recites the mental process of claim 26, where the processor comprises an encoder and decoder implemented on two neural networks. This judicial exception is not integrated into a practical application because the use of two generic neural networks does not add a meaningful limitation as they amount to simply implementing the mental process on a computer. The claim(s) does/do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the use of a probability distribution does not change the outcome of encoding the input.
Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-7, 11-13, and 20-23 are rejected under 35 U.S.C. 102(a)(2) as being anticipated by Stojevic (U.S. Patent No 20210081804-A1).
Regarding claim 1, Stojevic teaches:
obtaining an input ([0008]: "...a tensor network representation of molecular quantum states of a dataset of small, drug-like molecules is provided as an input to a machine learning system…"; Stojevic teaches a machine learning system that takes an input).
generating a latent variable vector in a latent variable region space partitioned into regions by encoding the input ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; [0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches encoding an input to a vector (i.e. generating a latent variable vector…by encoding the input) in a region of the latent space, and that this space has different interesting regions (i.e. generating…in a latent variable region space partitioned into regions)).
generating an output response corresponding to a region, from among the regions, of the latent variable vector by decoding the latent variable vector ([0026]: "The term ‘autoencoder’ preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,"; [0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches decoding the vector from a region of the latent space to produce (i.e. generate) an output that reproduces the input. The output is based on the decoded vector (i.e. corresponding to a region), and the region of the vector is one of several regions in the latent space (i.e. a region, from among the regions)).
Regarding claim 2, Stojevic teaches the method according to claim 1. Stojevic further teaches the latent variable vector is a multidimensional vector comprising latent information variables to generate a response to the input ([0014]: "The term ‘tensor’ preferably connotes a multidimensional or multi-rank array (a matrix and vector being examples of rank-2 or rank-1 tensors), where the components of the array are preferably functions of the coordinates of a space,"; [0149]: "...the latent space might be a tensorial object, or a simple vector (which is the usual setup in an autoencoder), or some other mathematical construct such as a graph. The output determined by a given element of the latent space (and in particular the optimal element of the latent space) will in general not be a part of the original dataset,"; Stojevic teaches a tensorial object which can have multiple dimensions (i.e. a multidimensional vector) that produces an output (i.e. response to an input) comprised of several elements (i.e. variables)).
Regarding claim 3, Stojevic teaches the method according to claim 1. Stojevic further teaches the regions correspond to a plurality of responses ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches a tensor network and data (i.e. a plurality of responses), where the tensor network is used to represent a region (i.e. the network that the region correspond to)).
Regarding claim 4, Stojevic teaches the method according to claim 3. Stojevic further teaches:
the latent variable region space is partitioned by control inputs corresponding to the plurality of responses ([0149]: "...the generative tensorial approach described here will explore regions of the huge space of possible compounds not accessible to other methods. The output data may alternatively or additionally be a filtered version of the input data, corresponding to a smaller number of data points,"; Stojevic teaches that the space has regions (i.e. the space is partitioned) and that the data that comes from it can be filtered based on data points (i.e. control inputs that correspond to responses))
a control input of the control inputs comprises information to generate the latent variable vector in the region of the latent variable region space ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches an intelligent prior based on available data (i.e. a control input) that is used to determine a network that a region of the latent variable space (i.e. information to generate the latent variable vector in the region of the latent variable region space)).
Regarding claim 5, Stojevic teaches the method according to claim 1. Stojevic further teaches:
generating a latent variable by encoding the input ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; Stojevic teaches encoding the input to a vector in the latent space (i.e. generating a latent variable))
generating the latent variable vector belonging to one of the regions of the latent variable region space corresponding to the latent variable ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches determining a tensor network representing an interesting region (i.e. generating the latent variable vector belonging to one of the regions) using (i.e. corresponding to) an intelligent prior (i.e. latent variable)).
Regarding claim 6, Stojevic teaches the method according to claim 4. Stojevic further teaches:
sampling a plurality of vectors based on a probability distribution representing the latent variable region space ([0163]: "…samples of real molecules are fed to Discriminator D; molecules are represented as tensor networks T..."; [0217]: "...a generative model G that captures the training dataset distribution and (b) a discriminative model D that estimates Stojevic teaches sampling a plurality of molecules represented as tensor networks (i.e. vectors), using a distribution of the training dataset (i.e. based on a probability distribution representing the latent variable region space))
generating the latent variable vector based on the sampled vectors ([0222]: "...the machine learning system outputs tensor network representations of the molecular quantum states of small drug-like molecules to a predictive model,"; Stojevic teaches outputting a tensor network (i.e. generating the latent variable vector) that represents molecules (i.e. the vector is based on the sampled vectors)).
Regarding claim 7, Stojevic teaches the method according to claim 4. Stojevic further teaches:
selecting one of control inputs corresponding to the regions of the latent variable region space ([0134]: “Tensor networks enable intelligent priors to be picked that, in turn, restrict the search to the space of physically relevant elements…"; Stojevic teaches selecting a prior (i.e. control input) that restricts a search to an area of a space (i.e. corresponding to the region[s] of the latent variable region space))
generating the latent variable vector belonging to the region corresponding to the selected control input based on a probability distribution ([0134]: “Tensor networks enable intelligent priors to be picked that, in turn, restrict the search to the space of physically relevant elements…"; [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches selecting a prior that restricts a search to an area of the space (i.e. region corresponding to the selected control input). Stojevic also teaches an autoencoder that encodes a set of latent variables (i.e. the latent variable vector), then samples the latent space from the prior Gaussian distribution (i.e. the region corresponding to the selected control input based on a probability distribution) and decodes an output (i.e. generating the latent variable vector belonging to the region)).
Regarding claim 11, Stojevic teaches the method according to claim 1. Stojevic further teaches:
encoding the input using an encoder ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; Stojevic teaches encoding the input, which requires an encoder)
a neural network of the encoder comprises an input layer corresponding to the input and an output layer corresponding to a mean and a variance of a probability distribution modeling a latent variable ([0026]: "The term 'autoencoder' preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,”; Fig. 21, [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches a VAE neural network that accepts an input, and produces an output that is a reproduction of the input (i.e. modeling a latent variable). The output is produced with the use of a Gaussian (i.e. corresponding to the mean and variance of a probability)).
Regarding claim 12, Stojevic teaches the method according to claim 1. Stojevic further teaches:
decoding the latent variable vector using a decoder ([0026]: "...to decode this vector to reproduce the input as accurately as possible,"; Stojevic teaches decoding the vector, which requires a decoder)
a neural network of the decoder comprises an input layer corresponding to the latent variable vector and an output layer corresponding to the output response ([0026]: "The term 'autoencoder' preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,”; [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches a VAE neural network decoder that takes an input from the latent space (i.e. latent variable vector) and produces an output corresponding to the input to the encoder (i.e. output response)).
Regarding claim 13, Stojevic teaches the method according to claim 1. Stojevic further teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the response inference method of claim 1 ([0274]: "...a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein,"; "As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory,"; Stojevic teaches a computer readable storage holding instructions for carrying out methods, and a processor for carrying out those instructions).
Regarding claim 20, Stojevic teaches:
a processor configured to: ([0128]: "...a processor for processing the chemical compound dataset to determine a tensorial space for said chemical compound dataset…"; Stojevic teaches a processor)
obtain an input ([0008]: "...a tensor network representation of molecular quantum states of Stojevic teaches a machine learning system that takes an input)
generate a latent variable vector in a latent variable region space partitioned into regions corresponding to a plurality of responses by encoding the input ([0026]: "The term ‘autoencoder’ preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,"; [0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches encoding an input to a vector (i.e. generating a latent variable vector…by encoding the input) in a region of the latent space, that this space has different interesting regions (i.e. generating…in a latent variable region space partitioned into regions). Stojevic also teaches that the encoded vector, which belongs to a region, is decoded to generate an output response, and that these regions can contain multiple vectors (i.e. regions corresponding to a plurality of responses))
generate an output response corresponding to a region, from among the regions, of the latent variable vector by decoding the latent variable vector ([0026]: "The term ‘autoencoder’ preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,"; [0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches decoding the vector from a region of the latent space, producing (i.e. generating) an output that reproduces the input. The output is based on the decoded vector (i.e. corresponding to a region), and the region of the vector is one of several regions in the latent space (i.e. a region, from among the regions)).
Regarding claim 21, Stojevic teaches the apparatus according to claim 20. Stojevic further teaches: the latent variable vector is a multidimensional vector comprising latent information variables to generate a response to the input ([0014]: "The term ‘tensor’ preferably connotes a multidimensional or multi-rank array (a matrix and vector being examples of rank-2 or rank-1 tensors), where the components of the array are preferably functions of the coordinates of a space,"; [0149]: "...the latent space might be a tensorial object, or a simple vector (which is the usual setup in an autoencoder), or some other mathematical construct such as a graph. The output determined by a given element of the latent space (and in particular the optimal element of the latent space) will in general not be a part of the original dataset,"; Stojevic teaches a tensorial object which can have multiple dimensions (i.e. a multidimensional vector) that produces an output (i.e. response to an input) comprised of several elements (i.e. variables)).
Regarding claim 22, Stojevic teaches the apparatus according to claim 20. Stojevic further teaches:
the latent variable region space is partitioned by control inputs corresponding to the plurality of responses ([0149]: "...the generative tensorial approach described here will explore regions of the huge space of possible compounds not accessible to other methods. The output data may alternatively or additionally be a filtered version of the input data, corresponding to a smaller number of data points,"; Stojevic teaches that the space has regions (i.e. the space is partitioned) and that the data that comes from it can be filtered based on data points (i.e. control inputs that correspond to responses))
a control variable of the control inputs comprises information to generate the latent variable vector in the region of the latent variable region space ([0139]: "The tensor Stojevic teaches an intelligent prior based on available data (i.e. a control input) that is used to determine a network that a region of the latent variable space (i.e. information to generate the latent variable vector in the region of the latent variable region space)).
Regarding claim 23, Stojevic teaches the apparatus according to claim 20. Stojevic further teaches:
generate a latent variable by encoding the input ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; Stojevic teaches encoding the input to a vector in the latent space (i.e. generating a latent variable))
generate the latent variable vector belonging to one of the regions of the latent variable region space corresponding to the latent variable ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches determining a tensor network representing an interesting region (i.e. generating the latent variable vector belonging to one of the regions) using (i.e. corresponding to) an intelligent prior (i.e. latent variable)).
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 8, 9, 14-19, and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Graves (Graves, Alex, Jacob Menick, and Aaron van den Oord. "Associative compression networks for .
Regarding claim 8, Stojevic teaches the method according to claim 4. Stojevic further teaches sampling vectors based on a probability distribution representing the latent variable region space ([0163]: "…samples of real molecules are fed to Discriminator D; molecules are represented as tensor networks T..." ; [0217]: "...a generative model G that captures the training dataset distribution and (b) a discriminative model D that estimates the probability that a sample came from the training dataset rather than G,"; Stojevic teaches sampling a plurality of molecules represented as tensor networks (i.e. vectors), using a distribution of the training dataset (i.e. based on a probability distribution representing the latent variable region space)).
Stojevic does not teach:
generating an embedded control input by randomizing a control input comprising information to generate the latent variable vector in the region of the latent variable region space
applying the embedded control input to each of the sampling vectors
generating the latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied.
Graves teaches:
generating an embedded control input by randomizing a control input comprising information to generate the latent variable vector in the region of the latent variable region space ((Section 3): "Associative compression networks (ACNs) are similar to VAEs, except the prior for each x is now conditioned on the distribution q(zj^x) used to encode some neighbouring datum ^x. We used a unit variance, diagonal Gaussian for all encoding distributions, meaning that q(zjx) is entirely described by its mean vector Ezq(zjx) [z], which we refer to as the code c for x. Given c, we randomly pick ^c, the code for ^x, from KNN(x), Graves teaches randomly picking an input to a prior network (i.e. generating an embedded control input by randomizing a control input) to obtain a distribution (i.e. comprising information to generate the latent variable vector in the region of the latent variable region space))
applying the embedded control input to each of the sampling vectors ((Section 2): "The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution..."; (Section 3): "We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches applying a value (i.e. embedded control input) to a distribution, from which sample vectors are pulled (i.e. applying…to each of the sampling vectors))
generating the latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied ((Section 4): "The encoding distribution q(zjx) was always a unit variance Gaussian with mean specified by the output of the encoder network. The dimensionality of z was 16 for binarized MNIST..."; (Section 4.1): "For the binarized MNIST experiments the ACN encoder had five convolutional layers..."; (Section 2): "The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution..."; (Section 3): "We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches a Gaussian (i.e. weighted sum of the sampled vectors) used in the encoding process (i.e. generating the latent variable vector). Graves also teaches applying a value to the distribution (i.e. to which the embedded control input is applied)).
Stojevic and Graves are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Graves before him or her to modify the sampling of Stojevic to include the use of an embedded control input as in Graves, obtaining the advantage of reducing sampling noise (Graves; (Section 3.2): “Note that in order to reduce sampling noise we use the mean codes c as latents for the reconstructions, rather than samples from N(c; 1)…”).
Regarding claim 9, Stojevic and Graves teach the method of claim 8. Stojevic does not teach the control input comprises a vector having a dimension that is same as a dimension of the latent variable vector.
Graves teaches the control input comprises a vector having a dimension that is same as a dimension of the latent variable vector ((Section 3): "We used a unit variance, diagonal Gaussian for all encoding distributions, meaning that q(zjx) is entirely described by its mean vector Ezq(zjx) [z], which we refer to as the code c for x. Given c, we randomly pick ^c, the code for ^x, from KNN(x)..."; Graves teaches a distribution (i.e. control input) that is described by the mean vector of a latent vector, which requires the distribution to share a dimension with the latent vector (i.e. a vector having a dimension that is same as a dimension of the latent variable vector)).
Stojevic and Graves are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Graves before him or her to modify the control input of Stojevic to include the shared dimension as in Graves, obtaining the advantage of describing the distribution with a vector (Graves; (Section 3): “…meaning that q(zjx) is entirely described by its mean vector Ezq(zjx) [z]…”).
Regarding claim 14, Stojevic teaches:
obtaining a training input ([0008]: "...a tensor network representation of molecular quantum states of a dataset of small, drug-like molecules is provided as an input to a machine learning system…"; Stojevic teaches obtaining an input (i.e. training input))
generating a latent variable by applying the training input to an encoder ([0171]: "The standard VAE first encodes an input x into a set of latent variables μ(x), σ(x),"; Stojevic teaches using an autoencoder to encode an input (i.e. training input) into a set of latent variables (i.e. generating a latent variable))
generating a training latent variable vector of a region corresponding to the control input in a latent variable region space corresponding to the latent variable ([0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches encoding an input into a set of latent variables in the latent space (i.e. generating a training latent variable vector of a region corresponding to the control input). The space is associated with the encoded set such that it reproduces the input when decoded (i.e. a latent variable region space corresponding to the latent variable))
generating an output response by applying the training latent variable vector to a decoder ([0026]: "...to decode this vector to reproduce the input as accurately as possible,"; Stojevic teaches decoding a vector (i.e. applying the training latent variable vector to a decoder) to reproduce an input (i.e. generating an output response))
training neural networks of the encoder and the decoder based on the output response and the training response ([0147]: "The weights in the neural network, or constituent tensors in a tensor network, are optimised to minimise the difference between outputs 118 and inputs 114,"; Stojevic teaches updating neural network weights (i.e. training neural networks of the encoder and the decoder) to minimize difference between the output and input (i.e. based on the output response and the training response)).
Stojevic does not teach:
obtaining a training response from among training responses to the training input
obtaining a control input corresponding to the training response from among control inputs corresponding to the training responses, respectively
Graves teaches:
obtaining a training response from among training responses to the training input ((Section 2): " The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution and used by the decoder to determine a code-conditional reconstruction distribution r(xjz) over the original data,"; Graves teaches taking a sample from a distribution based on an input, which is used to determine a response (i.e. obtaining a training response from among training responses to the training input))
obtaining a control input corresponding to the training response from among control inputs corresponding to the training responses, respectively ((Section 3): "Given c, we randomly pick ^c, the code for ^x, from KNN(x), the set of K nearest Euclidean neighbours to c among all the codes for the training data,"; Graves teaches selecting a code (i.e. obtaining a control input…from among control inputs) associated with the training data (i.e. corresponding to the training response))
Stojevic and Graves are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Graves before him or her to modify the encoder and decoder networks of Stojevic to include the organization of inputs and responses as in 
Regarding claim 15, Stojevic and Graves teach the method according to claim 14. Stojevic further teaches:
the training latent variable vector is a multidimensional vector comprising information variables latent to generate a response to the training input ([0149]: "...the latent space might be a tensorial object, or a simple vector (which is the usual setup in an autoencoder), or some other mathematical construct such as a graph. The output determined by a given element of the latent space (and in particular the optimal element of the latent space) will in general not be a part of the original dataset,"; Stojevic teaches a tensorial object (i.e. vector) that gives an output (i.e. response to an input) which is comprised of several elements (i.e. variables))
the control input is information to induce generation of a latent variable vector in a region of the latent variable region space ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches an intelligent prior (i.e. control input) used to determine a tensor network (i.e. induce generation of a latent variable vector) that represents an interesting region of the space (i.e. in a region of the latent variable region space)).
Regarding claim 16, Stojevic and Graves teach the method according to claim 14. Stojevic further teaches the latent variable region space is partitioned into regions corresponding to the control inputs ([0149]: "...the generative tensorial approach described here will explore regions of the huge space of possible compounds not accessible to other methods. The output data may alternatively Stojevic teaches a space that has regions (i.e. is partitioned into regions), and the data that comes from it can be filtered corresponding to an input, in this case a number of data points (i.e. corresponding to control inputs)).
Regarding claim 17, Stojevic and Graves teach the method according to claim 14. Stojevic further teaches sampling vectors based on a probability distribution representing the latent variable region space ([0163]: "…samples of real molecules are fed to Discriminator D; molecules are represented as tensor networks T..." ; [0217]: "...a generative model G that captures the training dataset distribution and (b) a discriminative model D that estimates the probability that a sample came from the training dataset rather than G,"; Stojevic teaches sampling a plurality of molecules represented as tensor networks (i.e. vectors), using a distribution of the training dataset (i.e. based on a probability distribution representing the latent variable region space)).
Stojevic does not teach:
generating an embedded control input by randomizing the control input
applying the embedded control input to each of the sampled vectors
generating a training latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied.
Graves teaches:
generating an embedded control input by randomizing the control input (Section 3: "Associative compression networks (ACNs) are similar to VAEs, except the prior for each x is now conditioned on the distribution q(zj^x) used to encode some neighbouring datum ^x. We used a unit variance, diagonal Gaussian for all encoding distributions, meaning that q(zjx) is entirely described by its mean vector Ezq(zjx) [z], which we refer to as the code c for x. Given c, we randomly pick ^c, the code for ^x, from KNN(x), the set of K nearest Euclidean Graves teaches randomly selecting a code (i.e. randomizing the control input), and using that code as an input (i.e. generating an embedded control input))
applying the embedded control input to each of the sampled vectors ((Section 2): "The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution..."; (Section 3): "We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches applying a value (i.e. embedded control input) to a distribution, from which sample vectors are pulled (i.e. applying…to each of the sampled vectors))
generating a training latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied ((Section 4): "The encoding distribution q(zjx) was always a unit variance Gaussian with mean specified by the output of the encoder network. The dimensionality of z was 16 for binarized MNIST..." (Section 4.1): "For the binarized MNIST experiments the ACN encoder had five convolutional layers..."; Graves teaches a Gaussian (i.e. weighted sum of the sampled vectors) used in the encoding process (i.e. generating a training latent variable vector). Graves also teaches applying a value to the distribution (i.e. to which the embedded control input is applied)).
Stojevic and Graves are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Graves before him or her to modify the sampling of Stojevic to include the use of an embedded control input as in Graves, obtaining the advantage of conserving computing resources (Graves; (Abstract): “Since the prior need only account for 
Regarding claim 18, Stojevic and Graves teach the method according to claim 14. Stojevic further teaches a value of a loss function comprising a difference between the training response and the output response is minimized ([0025]: "The term ‘cost function’ preferably connotes a mathematical function representing a measure of performance of an artificial neural network, or a tensor network, in relation to a desired output. The weights in the network are optimised to minimise some desired cost function,"; Stojevic teaches a function (i.e. a loss function) that measures the relation between the performance of a network and the desired output of that network (i.e. the difference between the training response and the output response), and that the network is updated to minimize the value of this function).
Regarding claim 19, Stojevic and Graves teach the method according to claim 14. Stojevic further teaches a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to perform the training method of claim 14 ([0274]: "...a computer readable medium having stored thereon a program for carrying out any of the methods described herein and/or for embodying any of the apparatus features described herein,"; "As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory,"; Stojevic teaches a computer readable storage medium storing a program for running methods (i.e. instructions), and a processor that runs the program).
Regarding claim 24, Stojevic teaches the apparatus according to claim 23. Stojevic further teaches sample vectors based on a probability distribution representing the latent variable region space ([0163]: "…samples of real molecules are fed to Discriminator D; molecules are represented as tensor networks T..." ; [0217]: "...a generative model G that captures the training dataset distribution Stojevic teaches sampling a plurality of molecules represented as tensor networks (i.e. vectors), using a distribution of the training dataset (i.e. based on a probability distribution representing the latent variable region space)).
Stojevic does not teach:
generate an embedded control input by randomizing a control input comprising information to generate the latent variable vector in a region of the latent variable region space
apply the embedded control input to each of the sampling vectors
generate the latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied.
Graves teaches:
generate an embedded control input by randomizing a control input comprising information to generate the latent variable vector in a region of the latent variable region space ((Section 3): "Associative compression networks (ACNs) are similar to VAEs, except the prior for each x is now conditioned on the distribution q(zj^x) used to encode some neighbouring datum ^x. We used a unit variance, diagonal Gaussian for all encoding distributions, meaning that q(zjx) is entirely described by its mean vector Ezq(zjx) [z], which we refer to as the code c for x. Given c, we randomly pick ^c, the code for ^x, from KNN(x), the set of K nearest Euclidean neighbours to c among all the codes for the training data. We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches randomly picking an input to a prior network (i.e. generate an embedded control input by randomizing a control input) to obtain a distribution (i.e. comprising information to generate the latent variable vector in a region of the latent variable region space))
apply the embedded control input to each of the sampling vectors ((Section 2): "The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution..."; (Section 3): "We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches applying a value (i.e. embedded control input) to a distribution, from which sample vectors are pulled (i.e. apply…to each of the sampling vectors))
generate the latent variable vector using a weighted sum of the sampled vectors to which the embedded control input is applied ((Section 4): "The encoding distribution q(zjx) was always a unit variance Gaussian with mean specified by the output of the encoder network. The dimensionality of z was 16 for binarized MNIST..."; (Section 4.1): "For the binarized MNIST experiments the ACN encoder had five convolutional layers..."; (Section 2): "The encoder receives observable data x as input and emits as output a data-conditional distribution q(zjx) over latent vectors z. A sample z q is drawn from this distribution..."; (Section 3): "We then pass ^c to the prior network to obtain the conditional prior distribution p(zj^c) and hence determine the KL cost,"; Graves teaches a Gaussian (i.e. weighted sum of the sampled vectors) used in the encoding process (i.e. generate the latent variable vector). Graves also teaches applying a value to the distribution (i.e. to which the embedded control input is applied)).
Stojevic and Graves are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Graves before him or her to modify the sampling of Stojevic to include the use of an embedded control input as in Graves, obtaining the .
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Vigen (U.S. Patent No 20090198488-A1), in view of Stojevic.
Regarding claim 10, Stojevic teaches the method according to claim 3. Stojevic does not teach:
the input is an utterance of a user not intended to get a specific response in a conversation
the plurality of responses are different responses to the utterance.
Vigen teaches:
the input is an utterance of a user not intended to get a specific response in a conversation ([0127]: "...microphones...voice recognizers...devices for input and/or output. The CPU 202 may acquire communications, instructions and/or data for implementing communications analysis through the input/output bus 210,"; [0008]: "These profiles may then be utilized by the system to generate responsive communications that are selected based upon the communicator's preferences as interpreted from the attributes,"; Vigen teaches an input that can be collected as spoken audio (i.e. utterance of a user) and can have multiple responses generated for it (i.e. not intended to get a specific response in conversation))
the plurality of responses are different responses to the utterance ([0008]: "These profiles may then be utilized by the system to generate responsive communications that are selected based upon the communicator's preferences as interpreted from the attributes,"; Vigen teaches multiple different responses to the input (i.e. utterance) can be generated).
Stojevic and Vigen are analogous art because they are from the same field of endeavor in machine learning. Before the effective filing date of the invention, it would have been obvious to a .
Claims 25-27 and 29 are rejected under 35 U.S.C. 103 as being unpatentable over Zadeh (U.S. Patent No 20180204111-A1), in view of Stojevic.
Regarding claim 25, Stojevic teaches:
a memory configured to store a latent variable region space partitioned into regions corresponding to responses ([0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; [0274]: "As used herein, means plus function features may be expressed alternatively in terms of their corresponding structure, such as a suitably programmed processor and associated memory,"; Stojevic teaches using a memory in its structure. Stojevic also teaches determining regions of a space (i.e. space partitioned into regions) using intelligent priors (i.e. corresponding to responses))
a processor configured to: ([0128]: "...a processor for processing the chemical compound dataset to determine a tensorial space for said chemical compound dataset…"; Stojevic teaches a processor)
encode the input to generate a latent variable vector in the latent variable region space ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; [0139]: "The tensor network used to represent interesting regions of the exponentially large space needs to be determined using an intelligent prior based on available data,"; Stojevic teaches encoding an input to a vector (i.e. encode the input to generate a latent variable vector) in a region of the latent space)
decode the latent variable vector to generate a response corresponding to a region from among the regions ([0026]: "...to decode this vector to reproduce the input as accurately as possible,"; [0145]: "...the data is decoded using the neural network (or tensor data),"; Stojevic teaches decoding the vector from a region of the latent space to produce (i.e. generate) an output that reproduces the input. The output is thus linked with the vector (i.e. corresponding to a region))
Stojevic does not teach:
a sensor configured to receive an input from a user
output the response through a user interface.
Zadeh teaches:
a sensor configured to receive an input from a user ([0729]: "The Z-mouse is for example provided through a user interface on a computing device or other controls such as sliding/knob type controls, to control the position and size of an f-mark,"; Zadeh teaches providing an input through a user interface via computing device or other controls (i.e. a sensor configured to receive an input from a user))
output the response through a user interface ([1810]: "...the results from above, which is connected to output module, e.g. printout or computer monitor or display or any graphic or table or list generator, for the user to use or see…"; Zadeh teaches an output module that allows the user to see results (i.e. output the response)).
Stojevic and Zadeh are analogous art because they are from the same field of endeavor in neural networks. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic and Zadeh before him or her to modify the encoding and decoding processes of Stojevic to include the user interface as in Zadeh, obtaining the 
Regarding claim 26, Stojevic and Zadeh teach the apparatus according to claim 25. Stojevic further teaches:
encode the input to generate a latent variable ([0026]: "...trained to encode the input to a small dimensional vector in the latent space…"; Stojevic teaches encoding the input and generating a latent vector (i.e. latent variable))
partition the latent variable region space into the regions corresponding to control inputs ([0149]: "...the generative tensorial approach described here will explore regions of the huge space of possible compounds not accessible to other methods. The output data may alternatively or additionally be a filtered version of the input data, corresponding to a smaller number of data points,"; Stojevic teaches that the space has regions (i.e. the space is partitioned) and that the data that comes from it can be filtered based on data points (i.e. control inputs that correspond to responses))
select a control input, from the control inputs, corresponding to the latent variable ([0134]: “Tensor networks enable intelligent priors to be picked that, in turn, restrict the search to the space of physically relevant elements…"; [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches a prior distribution (i.e. control input) associated with (i.e. corresponding to) samples from a latent space (i.e. latent variable[s]), used in the encoding and decoding operations. It also teaches selecting such priors)
generate the latent variable vector from the region of the latent variable region space corresponding to the control input ([0149]: “…the latent space might be a tensorial object, or a simple vector (which is the usual setup in an autoencoder), or some other mathematical construct such as a graph. The output determined by a given element of the latent space…”; [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches an autoencoder that encodes a set of latent variables (i.e. the latent variable vector), then samples the latent space from a prior Gaussian distribution (i.e. the region corresponding to the selected control input based on a probability distribution) and decodes an output (i.e. generating the latent variable vector belonging to the region)).
Regarding claim 27, Stojevic and Zadeh teach the method according to claim 26. Stojevic further teaches the control input is configured to randomly correspond to any one of the regions ([0461]: "Starting from a random n−1×n−1 dimensional orthogonal matrix, a random n×n dimensional orthogonal matrix can be constructed by taking a randomly distributed n-dimensional vector, constructing its Householder transformation, and then applying the n−1 dimensional matrix to this vector,"; ).
Regarding claim 29, Stojevic and Zadeh teach the method according to claim 26. Stojevic further teaches:
an encoder implementing a first neural network to receive the input at an input layer of the first neural network, and an output layer of the first neural network corresponding to a mean and a variance of a probability distribution modeling the latent variable ([0026]: "The term 'autoencoder' preferably connotes an artificial neural network having an output Stojevic teaches a VAE neural network that accepts an input, and produces an output that is a reproduction of the input (i.e. modeling the variable). The output is produced with the use of a Gaussian (i.e. mean and variance of a probability))
a decoder implementing a second neural network to receive the latent variable vector at an input layer of the second neural network, and an output layer of the second neural network corresponding to the response ([0026]: "The term 'autoencoder' preferably connotes an artificial neural network having an output in the same form as the input, trained to encode the input to a small dimensional vector in the latent space, and to decode this vector to reproduce the input as accurately as possible,”; [0171]: “The standard VAE first encodes an input x into a set of latent variables p(x), a(x). The decoder network samples the latent space from a prior distribution p(z), usually a15 Gaussian, and decodes to an output x'. The network is optimised to reproduce the inputs,”; Stojevic teaches a VAE neural network decoder that takes an input from the latent space (i.e. latent variable vector) and produces an output corresponding to the input to the encoder (i.e. output response)).
Claim 28 is rejected under 35 U.S.C. 103 as being unpatentable over Vigen, in view of Stojevic and Zadeh.
Regarding claim 28, Stojevic and Zadeh teach the apparatus according to claim 26. Vigen further teaches the control input corresponds to any one or any combination of keywords, sentiment of the user, attitude of the user, directive of the user, and guidance of the user ([0050]: "After the received Vigen teaches the system accepting words, phrases or groups (i.e. keywords), as well as goals, tone, and motivation (i.e. sentiment of the user, attitude of the user, directive of the user, and guidance of the user), as data the system could recognize and operate with (i.e. a control input)).
Stojevic, Zadeh and Vigen are analogous art because they are from the same field of endeavor in computing. Before the effective filing date of the invention, it would have been obvious to a person of ordinary skill in the art, having the teaching of Stojevic, Zadeh and Vigen before him or her to modify the control input of Stojevic and Zadeh to accept keywords and user sentiment, attitude, directive, and guidance as in Vigen, obtaining the advantage of being able to operate on such data (Vigen; [0127]: “The CPU 202 may acquire communications, instructions and/or data for implementing communications analysis through the input/output bus 210,”).
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAXWELL EDWARD MIKA whose telephone number is (571)272-2654. The examiner can normally be reached 7:30 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MAXWELL EDWARD MIKA/               Examiner, Art Unit 2129                                                                                                                                                                                         
/MICHAEL J HUNTLEY/Supervisory Patent Examiner, Art Unit 2129