DETAILED ACTION

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Amendment
This action is in response to the submission filed 23 May 2022 for application 16/562,192. Currently claims 1-14 and 27 are pending and have been examined.
The objection to claim 1 has been withdrawn in view of the amendments made.
The §112(b) rejection of claims 1-14 and 27 have been withdrawn in view of the amendments made. 

Response to Arguments

Applicant's arguments, filed 23 May 2022, regarding rejections under Double Patenting, see pages, 7-9, have been fully considered but they are not persuasive. Specifically, applicant disagrees, on page 7 (paragraph 2) that a skilled person would arrive at this conclusion, at least due to Salimans not disclosing or suggesting the missing limitation: "truncating a corresponding encoding base distribution based on input data from the input space". Examiner respectfully disagrees because Salimans teaches that limitation on Pages 1 and 2. A detailed explanation of Salimans' teachings is provided below in this section addressing the arguments under 35 U.S.C. § 102 and 103.
Applicant continues to argue, on page 7 (paragraph 3) that a skilled practitioner would not be motivated to combine any one of U.S. Patent Application Serial No. 16/720,273 and U.S. Patents No. 11,042,811 and 11,157,817 with Salimans, as each of these documents claim the use of different techniques for training an unsupervised learning model that one would not readily combine with the "truncating a corresponding encoding base distribution based on input data from the input space" for enhancing the mechanisms of their respective technical solutions. Examiner respectfully disagrees because all the conflicting patents/applications use rectified linear units and it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for machine learning over an input space of the conflicting patents/applications to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].
Applicant's arguments, filed 23 May 2022, regarding rejections under 35 U.S.C. 102 and 103, see pages, 9-14, have been fully considered but they are not persuasive. 
Specifically, applicant argues, see page 11 (paragraph 1),  that the Office Action's characterization of the Gaussian component as the encoding base distribution does not correspond to the limitation as claimed. Examiner respectfully disagrees because applicant does not explain why the Gaussian component cannot correspond to the encoding base distribution. Page 2 of Reference Salimans clearly shows the Gaussian component being used as the base distribution in the encoder and hence it corresponds to the encoding base distribution.
Applicant continues to argue, see page 11 (paragraph 2),  that the claimed truncating of a corresponding encoding base distribution would involve not only the truncating of the Gaussian distribution, but truncating the entire rectified Gaussian distribution and that this is not taught by the above-noted language of Salimans, which implies that the point mass at zero and the already-truncated Gaussian distribution are combined in some manner, and that no truncation is performed subsequent to the combining. Applicant continues to argue, see page 11 (paragraph 3), that Salimans' Gaussian distribution does not correspond to the claimed encoding base distribution, and truncating this Gaussian distribution to form part of the rectified Gaussian distribution does not teach the claimed truncating a corresponding encoding base distribution. Examiner respectfully disagrees because as explained above the Gaussian component corresponds to the encoding base distribution and hence, a truncated Gaussian distribution corresponds to truncating a corresponding encoding base distribution.

Additionally, Applicant continues to argue, see page 11 (last paragraph), the claimed limitation recites that the approximating posterior distribution is formed by truncating the encoding base distribution based on input data from the input space, which is not taught by Salimans. Similarly, on Page 12, applicant continues to argues that, Salimans does not define the rectified Gaussian as being based on input data. Examiner respectfully disagrees because reference Salimans teaches “based on input data from the input space” on Page 2.  Paragraph 4 of Page 2 states, function of the data and that we allow to depend on the data, which under the broadest reasonable interpretation, examiner is interpreting as based on input data from the input space.
Furthermore, applicant argues, on page 13, that as Salimans does not disclose at least the claimed limitation: "forming an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables, truncating a corresponding encoding base distribution based on input data from the input space", claim 1 is not anticipated by this reference and each one of claims 2-5, and 12 is also not anticipated at least by virtue of its dependency on currently amended claim 1, as well as for the specific limitations recited by those dependent claims. Examiner respectfully disagrees because as explained above and detailed in the rejection below, the cited references teach each and every element of independent claim 1 and its dependent claims. 
Applicant continues to argue, see page 13, that the independent claim 27 is not anticipated by Salimans because it includes comparable subject-matter to the above-noted limitation of claim 1, and is similarly not disclosed by Salimans. Examiner respectfully disagrees because as explained above and shown in detail below Salimans teaches each and every element of claim 1 and therefore claim 27.
Lastly, applicant argues, on page 14, that as such, each one of claims 6 to 9, 11, 13, and 14 are patentable over Salimans and Rolfe, and claim 10 is patentable over Salimans, Rolfe, and Stritzke at least by virtue of their dependency on claim 1, as well for the specific limitations recited by these dependent claims. Examiner respectfully disagrees because as explained above Salimans teaches each and every element of claim 1 and therefore the cited references teach each and every element of independent claim 1 and its dependent claims as shown in detail below.


Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.
Claim 1 is provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 16/270,273 (reference application) in view of Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016). Although the claims at issue are not identical, they are not patentably distinct from each other because the independent claims are anticipated by the corresponding conflicting independent claims and art. In particular, instant claim 1 is obvious over conflicting claim 1 and reference Salimans, and instant claim 27 is obvious over conflicting claim 10 and reference Salimans.
This is a provisional nonstatutory double patenting rejection because the patentably indistinct claims have not in fact been patented.
Instant Application 16/562,192
Co-pending application 16/270,273
A method for unsupervised learning over an input space comprising a plurality of input variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising;




forming a latent space comprising a plurality of continuous random latent variables;





forming an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,


forming a prior distribution over the latent space; 

forming a decoding distribution over the input space; 









and training the model based on the encoding, prior, and decoding distributions.

A method for machine learning over an input space comprising a plurality of input variables relating to a plurality of organisms, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising; 


forming a latent space comprising a genetic latent subspace and an environmental subspace, each subspace comprising one or more continuous random latent variables;

forming an approximating posterior distribution over the latent space, conditioned on the input space;




forming a prior distribution over the latent space;

forming a decoding distribution over the input space, conditioned on the latent space, the decoding distribution conditioned on the random latent variables of the genetic latent subspace based on genetic covariance induced by familial relationships between organisms;

and training the model based on the encoding, prior, and decoding distributions.


However, co-pending application 16/477,245, does not teach: truncating a corresponding encoding base distribution based on input data from the input space.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for machine learning over an input space of co-pending application 16/270,273 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].

A computational system, comprising: at least one processor;


and at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data which, when executed by the at least one processor cause the at least one processor to:













form a latent space comprising a plurality of continuous random latent variables;





form an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,


form a prior distribution over the latent space; 

form a decoding distribution over the input space; 










and train the model based on the encoding, prior, and decoding distributions.

A machine-learning system, comprising: at least one processor; 


at least one nontransitory processor-readable medium communicatively coupled to the at least one processor, the at least one nontransitory processor-readable medium which stores at least one of processor-executable instructions or data which, when executed by the at least one processor, cause the at least one processor to

attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, and particularly cause the processor to:

form a latent space comprising a genetic latent subspace and an environmental subspace, each subspace comprising one or more continuous random latent variables;


form an approximating posterior distribution over the latent space, conditioned on the input space;




form a prior distribution over the latent space;

form a decoding distribution over the input space,

conditioned on the latent space, the decoding distribution conditioned on the random latent variables of the genetic latent subspace based on genetic covariance induced by familial relationships between organisms;

and train the model based on the encoding, prior, and decoding distributions.


However, co-pending application 16/477,245, does not teach: truncating a corresponding encoding base distribution based on input data from the input space.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the machine learning system of co-pending application 16/270,273 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].




Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11042811 in view of Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016). Although the claims at issue are not identical, they are not patentably distinct from each other because the independent claims are is anticipated by the corresponding conflicting independent claims and art. In particular, instant claim 1 is obvious over conflicting claim 1 and reference Salimans, and instant claim 27 is obvious over conflicting claim 25 and reference Salimans.

Instant Application 16/562,192
US Patent No: 11042811
A method for unsupervised learning over an input space comprising a plurality of input variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising;










forming a latent space comprising a plurality of continuous random latent variables;












forming an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,

forming a prior distribution over the latent space; 

forming a decoding distribution over the input space; 







A method for unsupervised learning over an input space comprising discrete or continuous variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify a value of at least one parameter that increases a log-likelihood of at least the subset of the training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor, the method comprising;

forming a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables;


forming a second latent space comprising the first latent space and a set of supplementary continuous random variables; 

forming a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space;


forming an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space; 


forming a prior distribution over the first latent space;

forming a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables;

determining an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space; 

determining an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; 

constructing a first stochastic approximation to a lower bound on the log- likelihood of the at least a subset of a training dataset;  

76constructing a second stochastic approximation to a gradient of the lower bound on the log-likelihood of at least the subset of the training dataset; 

and increasing the lower bound on the log-likelihood of at least the subset of the training dataset based at least in part on the gradient of the lower bound on the log- likelihood of at least the subset of the training dataset, wherein constructing a second stochastic approximation to a gradient of the lower bound includes approximating a gradient of at least a first part of the first stochastic approximation with respect to one or more parameters of the prior distribution over the first latent space using samples from the prior distribution, wherein approximating the gradient of at least a first part of the first stochastic approximation with respect to one or more parameters of the prior distribution over the first latent space using samples from the prior distribution includes at least one of generating a plurality of samples or causing a plurality of samples to be generated by a quantum processor comprising a plurality of qubits and a plurality of coupling devices providing communicative coupling between respective pairs of qubits, wherein at least one of generating a plurality of samples or causing a plurality of samples to be generated by a quantum processor includes: 

forming one or more chains, each chain comprising a respective subset of the plurality of qubits; 

and representing at least one of the one or more discrete random variables of the first latent space by a respective chain.
However, US Patent No: 11042811, does not teach: truncating a corresponding encoding base distribution based on input data from the input space; and training the model based on the encoding, prior, and decoding distributions.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution); and training the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for unsupervised learning over an input space of US Patent No: 11042811 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space and train the model based on the encoding, prior, and decoding distributions. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].


A computational system, comprising: at least one processor;


and at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data which, when executed by the at least one processor cause the at least one processor to:








form a latent space comprising a plurality of continuous random latent variables;












form an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,


form a prior distribution over the latent space; 

form a decoding distribution over the input space; 



A computational system, comprising: at least one processor; 


and at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data which, when executed by the at least one processor cause the at least one processor to: 

form a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables;

form a second latent space comprising the first latent space and a set of supplementary continuous random variables;

form a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space;


form an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space; 



form a prior distribution over the first latent space;

form a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables;

determine an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space; 

determine an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; 

construct a first stochastic approximation to a lower bound on the log- likelihood of the at least a subset of a training dataset; 

construct a second stochastic approximation to a gradient of the lower bound on the log-likelihood of at least the subset of the training dataset; 

and increase the lower bound on the log-likelihood of at least the subset of the training dataset based at least in part on the gradient of the lower bound on the log- likelihood of at least the subset of the training dataset, wherein causing the at least one processor to construct a second stochastic approximation to a gradient of the lower bound includes causing the at least one processor to approximate a gradient of at least a first part of the first stochastic approximation with respect to one or more parameters of the prior distribution over the first latent space using samples from the prior distribution, wherein causing the at least one processor to approximate the gradient of at least a first part of the first stochastic approximation with respect to one or more parameters of the prior distribution over the first latent space using samples from the prior distribution includes at least one of causing the at least one processor to generate a plurality of samples or causing a quantum processor to generate a plurality of samples, the quantum 84processor comprising a plurality of qubits and a plurality of coupling devices providing communicative coupling between respective pairs of qubits, wherein at least one of causing the at least one processor to generate a plurality of samples or causing a quantum processor to generate a plurality of samples includes: 

form one or more chains, each chain comprising a respective subset of the plurality of qubits; 

and represent at least one of the one or more discrete random variables of the first latent space by a respective chain.

However, US Patent No: 11042811, does not teach: truncating a corresponding encoding base distribution based on input data from the input space; and train the model based on the encoding, prior, and decoding distributions.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution); and train the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).


It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computational system of US Patent No: 11042811 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space and train the model based on the encoding, prior, and decoding distributions. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].




Claim 1 is rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of U.S. Patent No. 11157817 in view of Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016). Although the claims at issue are not identical, they are not patentably distinct from each other because the independent claims are is anticipated by the corresponding conflicting independent claims and art. In particular, instant claim 1 is obvious over conflicting claim 1 and reference Salimans, and instant claim 27 is obvious over conflicting claim 25 and reference Salimans.
Instant Application 16/562,192
US Patent No: 11157817
A method for unsupervised learning over an input space comprising a plurality of input variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify the value of at least one parameter that increases the log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising;









forming a latent space comprising a plurality of continuous random latent variables;












forming an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,

forming a prior distribution over the latent space; 

forming a decoding distribution over the input space; 







A method for unsupervised learning over an input space comprising discrete or continuous variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify a value of at least one parameter that increases a log-likelihood of at least the subset of the training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor, the method comprising;

forming a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables;

forming a second latent space comprising the first latent space and a set of supplementary continuous random variables; 

forming a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space;

forming an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space;


forming a prior distribution over the first latent space;

forming a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables;

determining an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space;  2International Application No.: PCT/US2O 16/047627 International Filing Date: August 18, 2016 

Preliminary Amendment determining an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; 

constructing a first stochastic approximation to a lower bound on the log- likelihood of the at least a subset of a training dataset; 

constructing a second stochastic approximation to a gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset; 

and increasing the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log- likelihood of the at least a subset of a training dataset.

However, US Patent No: 11157817, does not teach: truncating a corresponding encoding base distribution based on input data from the input space; and training the model based on the encoding, prior, and decoding distributions.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution); and training the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the method for unsupervised learning over an input space of US Patent No: 11157817 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space and train the model based on the encoding, prior, and decoding distributions. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].

A computational system, comprising: at least one processor;


and at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data which, when executed by the at least one processor 
cause the at least one processor to:








form a latent space comprising a plurality of continuous random latent variables;












form an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables,

form a prior distribution over the latent space; 

form a decoding distribution over the input space; 



A computational system, comprising: at least one processor; 









cause the at least one processor to: 

form a first latent space comprising a plurality of random variables, the plurality of random variables comprising one or more discrete random variables, 

form a second latent space comprising the first latent space and a set of supplementary continuous random variables;

form a first transforming distribution comprising a conditional distribution over the set of supplementary continuous random variables, conditioned on the one or more discrete random variables of the first latent space; 


form an encoding distribution comprising an approximating posterior distribution over the first latent space, conditioned on the input space; 


form a prior distribution over the first latent space;

form a decoding distribution comprising a conditional distribution over the input space conditioned on the set of supplementary continuous random variables;

determine an ordered set of conditional cumulative distribution functions of the supplementary continuous random variables, each cumulative distribution function comprising functions of a full distribution of at least one of the one or more discrete random variables of the first latent space; 

determine an inversion of the ordered set of conditional cumulative distribution functions of the supplementary continuous random variables; 

construct a first stochastic approximation to a lower bound on the log- likelihood of the at least a subset of a training dataset; 

construct a second stochastic approximation to a gradient of the lower bound on the log-likelihood of the at least a subset of a training dataset; 

and increase the lower bound on the log-likelihood of the at least a subset of a training dataset based at least in part on the gradient of the lower bound on the log- likelihood of the at least a subset of a training dataset.
However, US Patent No: 11157817, does not teach: truncating a corresponding encoding base distribution based on input data from the input space; and train the model based on the encoding, prior, and decoding distributions.

Reference Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016), in an analogous system, teaches: truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution); and train the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).


It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the computational system of US Patent No: 11157817 to incorporate the teachings of Salimans to truncate a corresponding encoding base distribution based on input data from the input space and train the model based on the encoding, prior, and decoding distributions. One would have been motivated to do this modification because doing so would give the benefit of performing efficient stochastic gradient variational inference (in a network with ReLU) as taught by Salimans paragraph [Page 1, Section 1, Paragraph 2].




Claim Rejections - 35 USC § 102
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-5, 12, and 27 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016).

Regarding claim 1
Salimans teaches: A method for unsupervised learning over an input space comprising a plurality of input variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify a value of at least one parameter that increases a log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, the method executed by circuitry including at least one processor and comprising ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. [Page 3, Paragraph 1] During training we also use Batch-Normalization at each level of both the upward and downward passes to help regularize the training objective and to speed up convergence. [Page 2, Paragraph 2] In order to learn the parameters θ of our generative model p(x), we optimize a variational lower bound on the log marginal likelihood. Note: Batch corresponds to subset of a training dataset):
forming a latent space comprising a plurality of continuous random latent variables ([Abstract]  In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables [Page 1, Last Line] For continuous data. Note: Layers of latent random variables corresponds to a plurality of continuous random latent variables);
forming an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables ([Abstract ] we approximate the posterior of the latent variables [Page 1, Last Line] For continuous data [Page 2, Paragraph 3] we can fit q () to the posterior distribution of the latents p(zk|xk). Note: q() corresponds to the approximating posterior distribution), truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). [Page 2 , Paragraph 4] function of the data. that we allow to depend on the data. Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution. Data corresponds to input data from the input space);
forming a prior distribution over the latent space ([Page 2, Paragraph 4] the prior p(z). Note: z corresponds to the latent space);
forming a decoding distribution over the input space ([Page 1, Last Paragraph] After generating the last layer of latent features zL, we generate the observed data x from an appropriate conditional distribution p(x|zL). Note: p(x|zL) corresponds to decoding distribution);
and training the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).

Regarding claim 2
Salimans teaches: The method of claim 1 wherein forming the prior distribution comprises, for each of the continuous random latent variables, truncating a corresponding prior base distribution by rectifying the corresponding prior base distribution based on the continuous random latent variable ([Page 1, Last Line] For continuous data. [Page 2, Paragraph 4] That is we choose q (z|x) = q (z0|x)q (z1|x, z0) . . . q (zL|x, zL−1) to have exactly the same structure as the prior p(z). Each of the conditionals q (zl|x, zl−1) are once again Rectified Gaussian. [Page 1, Paragraph 4] The rectified Gaussian distribution is thus a mixture of a point mass at zero, and a truncated Gaussian distribution with support on the positive real line. Note: z corresponds to the latent space).

Regarding claim 3
Salimans teaches: The method of claim 2 wherein, for each continuous random latent variable, the corresponding encoding base distribution and the corresponding prior base distribution are parametrizations of a shared distribution, forming the prior distribution comprises truncating the shared distribution, and forming the approximating posterior distribution comprises truncating the shared distribution ([Abstract] we are able to perform joint training of deep models with many layers of latent random variables. [Page 1, Last Line] For continuous data. [Page 2, Paragraph 4] we define the variational posterior approximation q () to be a parameterized function of the data. Rather than using a factorized mean-field posterior approximation however, we use the parameterized encoder to define a structured posterior approximation. That is we choose q (z|x) = q (z0|x)q (z1|x, z0) . . . q (zL|x, zL−1) to have exactly the same structure as the prior p(z). Each of the conditionals q (zl|x, zl−1) are once again Rectified Gaussian. [Page 1, Paragraph 4] The rectified Gaussian distribution is thus a mixture of a point mass at zero, and a truncated Gaussian distribution with support on the positive real line. Note: Joint training with many layers corresponds to shared distribution. p(z) corresponds to prior base distribution).
Regarding claim 4
Salimans teaches: The method of claim 3 wherein the shared distribution comprises a Gaussian distribution and truncating the shared distribution comprises truncating the Gaussian distribution ([Page 1, Section 1, Paragraph 2] The rectified Gaussian distribution is thus a mixture of a point mass at zero, and a truncated Gaussian distribution with support on the positive real line. Both the mass at zero and the shape of the truncated Gaussian component are determined by the same parameters).

Regarding claim 5
Salimans teaches: The method of claim 1 wherein, when forming the approximating posterior, truncating the corresponding encoding base distribution ([Page 2, Paragraph 4] we use the parameterized encoder to define a structured posterior approximation) comprises rectifying at least one of the continuous random latent variables ([Page 1, Section 1] Rectified Gaussian distribution equation).

Regarding claim 12
Salimans teaches: The method of claim 1 wherein each of a first subset of the plurality of continuous random latent variables share a first common base distribution and forming the approximating posterior distribution comprises, for each of the first subset, truncating a corresponding approximating posterior base distribution comprises truncating the first common base distribution ([Page 1, section 1] See Equation. Note: N(0, 1) corresponds to a first common base distribution. Taking the maximum corresponds to truncating a corresponding approximating posterior base distribution).
Regarding claim 27
Salimans teaches: A computational system, comprising: at least one processor; and at least one nontransitory processor-readable storage medium that stores at least one of processor-executable instructions or data which, when executed by the at least one processor cause the at least one processor to ([Page 3, Section 2] Experiments [Page 2, Paragraphs 5 7 9]  computed) to perform unsupervised learning over an input space comprising a plurality of input variables, and at least a subset of a training dataset of samples of the respective variables, to attempt to identify a value of at least one parameter that increases a log-likelihood of the at least a subset of a training dataset with respect to a model, the model expressible as a function of the at least one parameter, such that the at least one processor is to ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. [Page 3, Paragraph 1] During training we also use Batch-Normalization at each level of both the upward and downward passes to help regularize the training objective and to speed up convergence. [Page 2, Paragraph 2] In order to learn the parameters θ of our generative model p(x), we optimize a variational lower bound on the log marginal likelihood. Note: Batch corresponds to subset of a training dataset):
form a latent space comprising a plurality of continuous random latent variables ([Abstract]  In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables [Page 1, Last Line] For continuous data);
form an approximating posterior distribution over the latent space, conditioned on the input space, and formed by, for each of the continuous random latent variables ([Abstract ] we approximate the posterior of the latent variables [Page 1, Last Line] For continuous data), truncating a corresponding encoding base distribution based on input data from the input space ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component. [Page 2, Paragraph 4] That is we choose q (z|x). [Page 2 , Paragraph 4] function of the data. that we allow to depend on the data. Note: Gaussian component corresponds to the encoding base distribution and q(z|x) corresponds to the truncated encoding base distribution. Data corresponds to input data from the input space);
form a prior distribution over the latent space ([Page 2, Paragraph 4] the prior p(z). Note: z corresponds to the latent space);
form a decoding distribution over the input space ([Page 1, Last Paragraph] After generating the last layer of latent features zL, we generate the observed data x from an appropriate conditional distribution p(x|zL). Note: p(x|zL) corresponds to decoding distribution);
and train the model based on the encoding, prior, and decoding distributions ([Abstract] To learn the parameters of the new model, we approximate the posterior of the latent variables with a variational auto-encoder. Using this structured posterior approximation, we are able to perform joint training of deep models with many layers of latent random variables).

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 6-9, 11, 13, and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016) in view of Rolfe (Discrete Variational Autoencoders, 2016).
Regarding claim 6 
Salimans teaches: The method of claim 5 wherein training the model comprises (as shown above).
However, Salimans does not explicitly disclose: determining a gradient over the approximating posterior based on a reparametrization of the at least one of the continuous random latent variables.
Rolfe teaches, in an analogous system: determining a gradient over the approximating posterior based on a reparametrization of the at least one of the continuous random latent variables ([Page 2, Paragraph 6] Moreover, a low-variance stochastic approximation to the gradient of the autoencoding term can be obtained using backpropagation and the  reparameterization trick, so long as samples from the approximating posterior q(zjx) can be drawn using a differentiable. [Page 23, Section F.2] The approximating posterior q is continuous, with nonzero derivative, so the reparameterization trick can be applied to backpropagate gradients).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified training the model of Salimans to incorporate the teachings of Rolfe to determine a gradient over the approximating posterior based on a reparametrization of the at least one of the continuous random latent variables. One would have been motivated to do this modification because doing so would give the benefit of drawing samples from a Gaussian distribution with mean and variance determined by the input, as taught by Rolfe paragraph [Page 2, Paragraph 6].

Regarding claim 7
Salimans teaches: The method of claim 5 (as shown above).
However, Salimans does not explicitly disclose: wherein rectifying at least one of the continuous random latent variables comprises applying a rectified linear unit to an initial value of the at least one of the continuous random latent variables generated by the approximating posterior distribution.
Rolfe teaches, in an analogous system: wherein rectifying at least one of the continuous random latent variables comprises applying a rectified linear unit to an initial value of the at least one of the continuous random latent variables generated by the approximating posterior distribution (Page 5, Section 2.1] 
    PNG
    media_image1.png
    19
    113
    media_image1.png
    Greyscale

    PNG
    media_image2.png
    71
    770
    media_image2.png
    Greyscale

It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified forming the approximating posterior of Salimans system to incorporate the teachings of Rolfe wherein rectifying at least one of the continuous random latent variables comprises applying a rectified linear unit to an initial value of the at least one of the continuous random latent variables generated by the approximating posterior distribution. One would have been motivated to do this modification because doing so would give the benefit of being quasi-sigmoidal, as taught by Rolfe paragraph [Page 5, Section 2.1].

Regarding claim 8
Salimans teaches: The method of claim 1 (as shown above) for each of the plurality of continuous random latent variables, truncating the corresponding prior base distribution comprises truncating the corresponding prior base distribution ([Page 1, Section 1, Paragraph 3] The rectified Gaussian distribution is thus a mixture of a point mass at zero, and a truncated Gaussian distribution with support on the positive real line. Both the mass at zero and the shape of the truncated Gaussian component are determined by the same parameters. [Abstract]  In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables [Page 1, Last Line] For continuous data. Note: Layers of latent random variables corresponds to a plurality of continuous random latent variables).
However, Salimans does not explicitly disclose: wherein forming the latent space further comprises forming a plurality of discrete random latent variables and, based on a state of a corresponding one of the discrete random latent variables.
Rolfe teaches, in an analogous system: wherein forming the latent space further comprises forming a plurality of discrete random latent variables and, based on a state of a corresponding one of the discrete random latent variables ([Page 17, Paragraph 4] To use discrete latent representations in the variational autoencoder framework, we must first transform to a continuous latent space, within which probability packets move smoothly. That is, we must compute Equation 18 over a different distribution than the original posterior distribution. Surprisingly, we need not sacrifice the original discrete latent space, with its associated approximating posterior).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the truncating system of Salimans to incorporate the teachings of Rolfe wherein forming the latent space further comprises forming a plurality of discrete random latent variables and, based on a state of a corresponding one of the discrete random latent variables. One would have been motivated to do this modification because doing so would give the benefit of extending both the encoder and the prior in the same way, avoiding affecting the remaining KL divergence in Equation 2 as taught by Rolfe paragraph [Page 5, Section 2.1].

Regarding claim 9
Salimans and Rolfe teaches: The method of claim 8 (as shown above).
Salimans further teaches: wherein, for each of the plurality of continuous random latent variables, truncating the corresponding prior base distribution based on the state of the corresponding one of the discrete random latent variables comprises selecting at least one of: an activation regime and an inactivation regime and: if the activation regime is selected, causing samples to be drawn for the continuous random variable from the corresponding prior base distribution; and if the inactivation regime is selected, causing samples to be drawn for the continuous random variable from a singularity distribution ([Page 1, Section 1] Rectified Gaussian distribution equation. maximum(μji + σji ǫ, 0) Note: Taking the maximum corresponds to selection. Selecting zero corresponds to inactive regime. Selecting if greater than zero corresponds to activation regime. [Abstract]  In this note we present a generative model of natural images consisting of a deep hierarchy of layers of latent random variables [Page 1, Last Line] For continuous data. Note: Layers of latent random variables corresponds to a plurality of continuous random latent variables).

Regarding claim 11
Salimans and Rolfe teaches: The method of claim 9 (as shown above).
Salimans further teaches: wherein training the model comprises regularizing one or more continuous random latent variables based on the one or more continuous random latent variables being in the activation regime ([Page 3, Paragraph 1] During training we also use Batch-Normalization at each level of both the upward and downward passes to help regularize the training objective and to speed up convergence).

Regarding claim 13
Salimans teaches: The method of claim 12 (as shown above).
However, Salimans does not explicitly disclose: wherein training the model comprises determining a gradient of an objective function based on a reparametrization of the first subset of continuous random latent variables.
Rolfe teaches, in an analogous system: wherein training the model comprises determining a gradient of an objective function based on a reparametrization of the first subset of continuous random latent variables ([Page 2, Paragraph 6] Moreover, a low-variance stochastic approximation to the gradient of the autoencoding term can be obtained using backpropagation and the reparameterization trick, so long as samples from the approximating posterior q(zjx) can be drawn using a differentiable, deterministic function f(x; -; ) of the combination of the inputs, the parameters, and a set of input and parameter-independent random variables  D).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified training the model of Salimans to incorporate the teachings of Rolfe to determine a gradient over the approximating posterior based on a reparametrization of the at least one of the continuous random latent variables. One would have been motivated to do this modification because doing so would give the benefit of drawing samples from a Gaussian distribution with mean and variance determined by the input, as taught by Rolfe paragraph [Page 2, Paragraph 6].
Regarding claim 14
Salimans and Rolfe teaches: The method of claim 13 (as shown above).
Salimans further teaches: wherein: each of a second subset of the plurality of continuous random latent variables share a second common base distribution, the second common base distribution having at least one trainable parameter separate from the one or more trainable parameters of the first common base distribution ([Page 1, section 1] See Equation. Note: RG(μji , σji ) corresponds to a second common base distribution and has 2 parameters μ and σ corresponding to at least one trainable parameter separate from the one or more trainable parameters);
and forming the approximating posterior distribution comprises, for each continuous random latent variable of the second subset, truncating a corresponding approximating posterior base distribution comprises truncating the first common base distribution ([Page 1, Section 1, Paragraph 2]  the truncated Gaussian component).

Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Salimans (A Structured Variational Auto-encoder for Learning Deep Hierarchies of Sparse Features, 2016) in view of Rolfe (Discrete Variational Autoencoders, 2016) and further in view of Stritzke (US 5249122 A).
Regarding claim 10
Salimans and Rolfe teaches: The method of claim 9 (as shown above).
However, the system of Salimans and Rolfe does not explicitly disclose: wherein the singularity distribution comprises a Dirac delta distribution.
Stritzke teaches, in an analogous system: wherein the singularity distribution comprises a Dirac delta distribution ([Column 6, Lines 3,4] The Dirac delta function in equ.2 represents a singularity).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified the combined teachings of Salimans Rolfe to incorporate the teachings of Stritzke wherein the singularity distribution comprises a Dirac delta distribution. One would have been motivated to do this modification because doing so would give the benefit of the linear response function reaching an infinite high function value at time zero, as taught by Stritzke paragraph [Column 6, Lines 5,6].

Conclusion

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHAITANYA RAMESH JAYAKUMAR whose telephone number is (571)272-3369. The examiner can normally be reached Mon-Fri 7am-1pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Omar Fernandez Rivas can be reached on (571)272-2589. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/C.R.J./Examiner, Art Unit 2128                                                                                                                                                                                                        
/BRIAN M SMITH/Primary Examiner, Art Unit 2122