DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Oath/Declaration
Examiner notes that Applicant has not submitted an Oath/Declaration.
Response to Amendment
The amendment filed 2022-04-21 has been entered.  Applicant’s amendments to claims 14-20 have overcome the previous rejections under 35 USC 101.  The status of the claims is as follows:
Claims 1-20 remain pending in the application.
Claims 1, 7-8, 13-14, and 20 are amended.
Response to Arguments
Applicant's arguments in response to rejections under 35 USC 103 have been fully considered but they are not persuasive. 
Applicant argues on Remarks Pages 12-13 that the claimed method imposes a temporal ordering, while Menick chooses one of the K nearest neighbors to the code generated form the latent distribution.  Examiner respectfully points out that the term “temporal ordering” does not appear in the claim language, and the claims only recite terms “current” and “previous”, which are indeed features of Menick as Menick describes an iterative process, which comprises constructing a probability distribution, which is based on past observations, and also recites the “current observation” in several places.  Examiner points out that the existence of Menick’s step of K nearest neighbors does not preclude Menick from reading on the instant claims, as shown in the response to the next argument, and in the rejections below.  Nevertheless, Examiner points out that Menick [0049] explicitly recites “ordering” as being a critical element:  “Training the prior neural network naturally induces an ordering of the observations in the set of training data based on the similarity of the respective codes representing the observations. A compression system can use the ordering of the observations in the training data to effectively compress the training data. In particular, the compression system can compress the training data more effectively (i.e., at a higher compression rate) than if the compression system used encoder and decoder neural networks trained using a conventional variational autoencoder system (e.g., without the prior neural network).”  Finally, Examiner also notes that the primary reference Bowman discloses a temporal ordering, with a sequence of observations in a “sequence autoencoder”.
Examiner also acknowledges Applicant’s statement in this argument that “Menick is only available as a reference to the extent it can rely on its priority claim to the provisional patent application. More specifically, subject matter in Menick that was not actually described in the provisional patent application cannot be relied on to reject the claims.” Examiner had examined the provisional application before the first office action, and determined that it does support the subject matter in the subsequent application.  Examiner notes that Applicant has not pointed out any specific features that do not have support in the provisional patent application. In fact, upon further examination, Examiner points out that the provisional application is even stronger, as it leaves out the intermediate step of retrieving the “Neighboring Code 128” via K nearest neighbors, which is the source of both of Applicant’s arguments.
Applicant argues on Remarks Pages 13-14 that Menick does not teach “inputting a value generated from a previous latent distribution directly to an input layer of a transition network” because “Menick performs the intermediate steps of generating the updated code 126 and identifying and retrieving a neighboring code 128 from the codes store 110 before the neighboring code 128 is processed by the prior neural network 106. Thus, any value generated from the encoding distribution 116 at some previous step is not input directly to an input layer of the prior neural network 106.”  Examiner respectfully disagrees, as the term “transition network” is not explicitly defined as consisting solely of a neural network.  Applicant points out that instant Spec [0032] recites that "the transition network may be arranged as one or more layers of nodes that are associated with a set of parameters” (Underlining added by examiner).  Examiner respectfully points out that an “input layer” of a “transition network” may be broadly interpreted, and Menick may be interpreted in the following way.  The “Updated Code 126” is the value sampled from the latent distribution.  Subsequently, the “Neighboring Code 128” and “Prior Neural Network” jointly comprise a “transition network” wherein the determination of “Neighboring Code 128” is the first “layer”.  See Figure 1 below:

    PNG
    media_image1.png
    777
    741
    media_image1.png
    Greyscale


Remarks on Advancing Prosecution
Examiner points out that even if Applicant were to be able to claim that the “transition network” consists solely of a neural network, there are other pieces of art of concern, which may suggest inputting a value from a latent state directly into a neural network that produces a prior distribution.  Examiner recommends reviewing the prior art not relied upon at the bottom of this action.
Claim Rejections - 35 USC § 103
Claim 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Bowman et. al. (“Generating Sentences from a Continuous Space”; hereinafter “Bowman”) in view of Menick et. al. (US 2021/0004677 A1; hereinafter “Menick”).
As per Claim 1, Bowman teaches a method of training a recurrent machine-learned model having an encoder network and a decoder network the method comprising (Bowman, Abstract, discloses a recurrent model:  “The standard recurrent neural network language model (rnnlm) generates sentences one word at a time and does not work from an explicit global sentence representation. In this work, we introduce and study an rnn-based variational autoencoder generative model that incorporates distributed latent representations of entire sentences.”  Bowman, Introduction Para 3 on Page 1, discloses a method of training:  “Our contributions are as follows: We propose a variational autoencoder architecture for text and discuss some of the obstacles to training it as well as our proposed solutions.”  Bowman, Section 3 on Page 3, discloses the encoder and decoder:  “We adapt the variational autoencoder to text by using single-layer lstm rnns (Hochreiter and Schmidhuber, 1997) for both the encoder and the decoder, essentially forming a sequence autoencoder with the Gaussian prior acting as a regularizer on the hidden code.”)
obtaining a sequence of observations (Bowman, Section 3 on Page 3, discloses:  “We adapt the variational autoencoder to text by using single-layer lstm rnns (Hochreiter and Schmidhuber, 1997) for both the encoder and the decoder, essentially forming a sequence autoencoder with the Gaussian prior acting as a regularizer on the hidden code.”  Here, Bowman discloses a “sequence autoencoder”, in which next values of a sequence are predicted based on previous values of a sequence.  This is explicitly illustrated in Bowman Table 3:

    PNG
    media_image2.png
    317
    735
    media_image2.png
    Greyscale

for each observation in the sequence, repeatedly performing the steps of: 
generating a current latent distribution for a current observation by applying the encoder network to the current observation and values of the encoder network for one or more previous observations, the current latent distribution representing a distribution for a latent state of the current observation given a value of the current observation and a latent state for the one or more previous observations (Bowman, Section 3 on Page 3 as shown above, state that they “adapt the variational autoencoder”.  Bowman provides some background on this in Page 2 Section 2.1 and Section 2.2:

    PNG
    media_image3.png
    600
    772
    media_image3.png
    Greyscale

As shown in Section 2.1, “Phienc” is the encoder function, “x” is an observation, and “z” is a “learned code” (or, a “latent state”) of “x”.  In Section 2.2, it is disclosed that in the variational autoencoder, Phienc produces a probability distribution for “z” called q(z|x).  This may be called a “current latent distribution”, and it is generated by applying the encoder to the current observation.  It represents a distribution of the latent state “z”.  Bowman, Section 3, also discloses that they use “LSTM” for “both the encoder and the decoder”.  LSTM is a recurrent model, and therefore latent states from one or more previous observations are input to the encoder.  This is explicitly illustrated in Figure 1:

    PNG
    media_image4.png
    334
    447
    media_image4.png
    Greyscale

generating a prior distribution [by inputting a value generated] from a previous latent distribution for one or more previous observations [directly to an input layer of a transition network], the prior distribution representing a distribution for the latent state of the current observation given the latent state for the one or more previous observations independent of the value of the current observation (Bowman, Page 2 Section 2.2, discloses:  “This model imposes a prior distribution on the hidden codes z which enforces a regular geometry over codes and makes it possible to draw proper samples from the model using ancestral sampling.”  Here Bowman discloses a prior distribution, and it is for an estimated latent state z.  As shown above, previous observations are taken into account when generating z, and therefore the generated prior distribution is for an estimated latent state for the one or more previous observations from a previous latent distribution.  The prior distribution represents a distribution of the latent state of the current observation (“z” is the latent state). As discussed above, the latent state of the current observation is partly based on the latent state of one or more previous observations.  Since the latent state(s) of one or more previous observations were calculated previously to input of the current observation, then the latent state(s) of one or more previous observations are thus independent of the value of the current observation.)
generating an estimated latent state for the current observation from the current latent distribution (Bowman, Page 2 Section 2.2 Last Paragraph, discloses:  “We train our models with stochastic gradient descent, and at each gradient step we estimate the reconstruction cost using a single sample from q(z|x)”, where “q (z|x)” is a probability of “z given x”, wherein z is the latent state. A single sample from this is an estimated latent state from the current latent distribution “q (z|x)”).
generating a predicted likelihood for observing a subsequent observation given the latent state for the current observation by applying the decoder network to the estimated latent state for the current observation (Bowman, Page 2 Section 2.1 Para 2, discloses:  “a probabilistic decoder model p(x|z =Phienc(x)), and maximizes the likelihood of an example x conditioned on z, the learned code for x.”  Here, the decoder generates a predicted likelihood of the next observation, based on the estimated latent state z.)
and determining a loss for the current observation including a combination of a prediction loss and a divergence loss, the prediction loss increasing as the predicted likelihood for the subsequent observation decreases, and the divergence loss indicating a measure of difference between the current latent distribution and the prior distribution (Bowman, Page 2 Section 2.2 and Para 3, discloses:

    PNG
    media_image5.png
    394
    892
    media_image5.png
    Greyscale

Here, Bowman discloses an “objective function”, which is just the opposite of a “loss function”.  The signs of the terms are switched when this is considered a “loss function”, and in such a case, the second term would be negative, and therefore the loss would increase as the Expectation (predicted likelihood) decreases.  The KL divergence is between the latent distribution q(z | x) and the prior distribution p(z)).
	backpropagating one or more error terms from the loss function to update parameters of the encoder network and the decoder network (Bowman, Page 2 Section 2.2 Concludes:  “We train our models with stochastic gradient descent, and at each gradient step we estimate the reconstruction cost using a single sample from q(~zjx), but compute the kl divergence term of the cost function in closed form, again following Kingma and Welling (2015).” Here, Bowman discloses “stochastic gradient descent”, which is a form of backpropagation.  Bowman as shown above disclosed a loss function, and has disclosed that their network comprises an encoder and decoder network.)
	However, Bowman does not explicitly teach the machine-learned model having a transition network; that the generating a prior distribution is by inputting a value generated from a previous latent distribution for one or more previous observations directly to an input layer of a transition network; and backpropagating one or more error terms from the loss function to the transition network.
	Menick teaches the machine-learned model having a transition network;  generating a prior distribution by inputting a value generated from a previous latent distribution for one or more previous observations directly to an input layer of a transition network (Menick, [0049], discloses:  “The system described in this specification trains a prior neural network in tandem with an encoder neural network and a decoder neural network in a variational autoencoder framework.”  Here, Menick discloses that the model comprises, in addition to the encoder and decoder networks, also a “prior neural network”.   Menick, Para [0072], discloses:  “For each of the observations 114 in the current batch, the system 100 processes the neighboring code 128 for the observation using the prior neural network 106. The prior neural network 106 is configured to process the neighboring code to generate an output that includes the parameters of a prior probability distribution 124 that models the code for the observation. Like the encoding probability distribution 116, the prior probability distribution 124 is a probability distribution over the latent space.” Here Menick discloses that the encoding probability distribution is a previous latent distribution (“is a probability distribution over the latent space”), and a probability distribution requires multiple observations, and thus previous observations.  Menick also discloses that the “prior neural network” is a network that produces a “prior probability distribution”, which is analogous to the function of the “transition network” of the instant claim.  Also note that the two steps (Neighboring Code 128, and Prior Neural Network 106) may be considered a “transition network” as claimed, with each step being a “layer”.  Thus, a value (“Updated Code” 126) from the previous latent distribution (“Encoding Distribution” 116), is directly input to a layer of the transition network. The updated code is a value generated from the previous latent distribution as disclosed by Menick [0065]:  “For each of the observations 114, the system 100 uses the encoding distribution for the observation to: (i) determine an updated code 126 for the observation based on the encoding distribution.” This is also shown well in Fig. 1:

    PNG
    media_image1.png
    777
    741
    media_image1.png
    Greyscale
)
determining a loss function of the sequence of observations as a combination of the losses for each observation in the sequence (Bowman did not explicitly recite this, but Menick does in [0088]:

    PNG
    media_image6.png
    574
    892
    media_image6.png
    Greyscale

Here, the loss function is over a sequence of observations as a combination of losses for each observation in the sequence, as evidenced by the capital Sigma summation sign.  Also note that this loss function is much like Bowman’s objective function.  It is a divergence loss and prediction loss, wherein the loss increases as the predicted likelihood decreases.) 
backpropagating one or more error terms from the loss function to the transition network (Menick, [0089], discloses backpropagating for all 3 of the encoder, decoder, and prior (transition) networks:  “The system may determine the gradients of the loss function with respect to the parameters of the encoder neural network, the decoder neural network, and the prior neural network in any appropriate manner, for example, using backpropagation.”)
Bowman and Menick are analogous art because they are both in the field of endeavor of machine learning.
It would have been obvious before the effective filing date of the invention to combine the recurrent variational autoencoder with KL divergence from a prior distribution of Bowman with the variational autoencoder with KL divergence from a prior distribution generated via a “prior network” of Menick.  One would be motivated to so in order to gain increased efficiency by reducing the cost of computations needed to generate the prior distribution, and achieving better results by the prior distribution better reflecting the surrounding observations.   The benefits of Menick’s approach are explained well in the accompanying paper that was included in the provisional application by Graves, Menick, and van den Oord (“Associative Compression Networks for Representation Learning”), Abstract:  (“This paper introduces Associative Compression Networks (ACNs), a new framework for variational autoencoding with neural networks. The system differs from existing variational autoencoders (VAEs) in that the prior distribution used to model each code is conditioned on a similar code from the dataset. In compression terms this equates to sequentially transmitting the dataset using an ordering determined by proximity in latent space. Since the prior need only account for local, rather than global variations in the latent space, the coding cost is greatly reduced, leading to rich, informative codes. Crucially, the codes remain informative when powerful, autoregressive decoders are used, which we argue is fundamentally difficult with normal VAEs.”)

	As per Claim 2, the combination of Bowman and Menick teaches the method of Claim 1.  Bowman teaches wherein the estimated latent state for the current observation is generated by sampling one or more values from the latent distribution for the current observation (Bowman, Page 2 Section 2.2 Last Paragraph, discloses:  “We train our models with stochastic gradient descent, and at each gradient step we estimate the reconstruction cost using a single sample from q(z|x)”, where “q (z|x)” is a probability of “z given x”, wherein z is the latent state. A “single sample” (sampling one or more values) from this is an estimated latent state from the current latent distribution “q (z|x)”).

As per Claim 3, the combination of Bowman and Menick teaches the method of Claim 2.  Bowman teaches wherein generating the predicted likelihood comprises generating one or more predicted likelihoods of observing the subsequent observation by applying the decoder network to the one or more sampled values from the latent distribution for the current observation (Bowman, Page 2 Section 2.1 Para 2, discloses:  “a probabilistic decoder model p(x|z =Phienc(x)), and maximizes the likelihood of an example x conditioned on z, the learned code for x.”  Here, the decoder generates (maximizes) a predicted likelihood of x, based on the latent state z, which is the “learned code” (“sampled value”) from the latent distribution for the current observation x.)

As per Claim 4, the combination of Bowman and Menick teaches the method of Claim 3.  Bowman teaches wherein the prediction loss is an expected value of the one or more predicted likelihoods (Bowman, Page 2 Section 2.2 Para 3, discloses the “E” term:

    PNG
    media_image7.png
    457
    957
    media_image7.png
    Greyscale

As per Claim 5, the combination of Bowman and Menick teaches the method of Claim 1.  Bowman teaches wherein the divergence loss is a Kullback-Leibler divergence between the prior distribution and the current latent distribution (Bowman, Page 2 Section 2.2 Para 3, in the equation 1 shown in the screenshot above, discloses KL divergence between the latent distribution q(z | x) and the prior distribution p(z)).

As per Claim 6, the combination of Bowman and Menick teaches the method of Claim 1.  Bowman teaches wherein the current latent distribution is defined by a set of statistical parameters of a probability distribution, and wherein the encoder network is configured to output the set of statistical parameters. (Bowman provides some background on this in Page 2 in Section 2.1 and Section 2.2:

    PNG
    media_image3.png
    600
    772
    media_image3.png
    Greyscale

As shown in Section 2.1, “Phienc” is the encoder function, “x” is an observation, and “z” is a “learned code” (or, a “latent state”) of “x”.  In Section 2.2, it is disclosed that in the variational autoencoder, Phienc produces a probability distribution for “z” called q(z|x).  This may be called a “current latent distribution”, and it is generated by applying the encoder to the current observation.  It represents a distribution of the latent state “z” and is thus a latent distribution.  A probability distribution is defined by parameters (for example, mean and variance).  As Bowman stated above, this distribution (and thus its parameters) have been produces by the encoder network, so therefore the encoder network is configured to output the set of statistical parameters.) 

As per Claim 7, the combination of Bowman and Menick teaches the method of Claim 1.  Menick teaches wherein the prior distribution is defined by a set of statistical parameters of a probability distribution, wherein the value is sampled from the previous latent distribution, and wherein generating the prior distribution comprises: applying the transition network to the value sampled from the previous latent distribution to generate one or more corresponding output values; estimating the set of statistical parameters for the prior distribution from the one or more output values.  (Menick, Para [0083], discloses:  “The system assigns an updated code to the given observation based on the parameters of the encoding probability distribution over the latent space (208). For example, the system may assign an updated code to the given observation that is given by a vector representing the mean of the encoding probability distribution.”  Here, Menick discloses determining the value (“updated code”) from the previous latent distribution (“encoding probability distribution”).  Menick explicitly recites taking the “mean” as one example, but also broadly suggests “based on the parameters of the encoding probability distribution”, which is suggestive of sampling, as sampling is a process of randomly selecting items from a distribution, and a distribution is based on the parameters of the distribution. Instant Spec [0022] also describes the “mean” as Menick does, as well as “or is determined based on one or more samples”:  “In one instance, the value vt is the mean of the latent distribution q,(zt I it, zt-1), or is determined based on one or more samples from the latent distribution q,(zt I it, zt-1)”  However, as Menick does not explicitly recite sampling, Bowman will be used for the explicit teaching below.  This latent distribution is generated “previous” to the generation of the prior distribution.  The transition network (Menick’s Neighboring Code 128 and Prior Neural Network 106 as shown in Claim 1) is then applied to the value to generate output values, as described in Menick [0084]: “The system selects a “neighboring” code that is assigned to an additional observation (i.e., that is different than the given observation) based on a similarity of the neighboring code to the updated code assigned to the given observation (210)” and [0085]:  “The system provides the neighboring code as input to the prior neural network, which is configured to process the neighboring code in accordance with current parameter values of the prior neural network to generate as output parameters of a prior probability distribution over the latent space”.  Here Menick discloses that the prior distribution is defined by a set of statistical parameters of a probability distribution, and that these parameters are calculated by generating output values from the “neighboring code” and “prior neural network” (transition network).  This output from the transition distribution is used to estimate the statistical parameters representing the prior distribution.)
However, Menick does not explicitly teach wherein the value is sampled from the previous latent distribution.
Bowman teaches wherein the value is sampled from the previous latent distribution (Bowman, Page 2 Section 2.2 Last Paragraph, discloses:  “We train our models with stochastic gradient descent, and at each gradient step we estimate the reconstruction cost using a single sample from q(z|x)”, where “q (z|x)” is a probability of “z given x”, wherein z is the latent state. A “single sample” (sampling one or more values) from this is an estimated latent state from the current latent distribution “q (z|x)”).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Bowman and Menick for at least the reasons recited in Claim 1.

As per Claim 8, Claim 8 is a non-transitory computer-readable medium claim corresponding to method claim 1.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Menick, [0117], discloses:  “Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus”.  Claim 8 is rejected for the same reasons as claim 1.

As per Claim 9, Claim 9 is a non-transitory computer-readable medium claim corresponding to method claim 2.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Claim 9 is rejected for the same reasons as claim 2.

As per Claim 10, Claim 10 is a non-transitory computer-readable medium claim corresponding to method claim 4.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Claim 10 is rejected for the same reasons as claim 4.

As per Claim 11, Claim 11 is a non-transitory computer-readable medium claim corresponding to method claim 5.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Claim 11 is rejected for the same reasons as claim 5.

As per Claim 12, Claim 12 is a non-transitory computer-readable medium claim corresponding to method claim 6.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Claim 12 is rejected for the same reasons as claim 6.

As per Claim 13, Claim 13 is a non-transitory computer-readable medium claim corresponding to method claim 7.  The difference is that it recites a non-transitory computer-readable medium and a processor.  Claim 13 is rejected for the same reasons as claim 7.

As per Claim 14, Claim 14 is a model claim corresponding to method claim 1.  The difference is that it recites a computer readable storage medium.  Menick, [0117], discloses:  “Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible non-transitory storage medium for execution by, or to control the operation of, data processing apparatus”.  Claim 14 is rejected for the same reasons as claim 1.

As per Claim 15, Claim 15 is a model claim corresponding to method claim 2.  The difference is that it recites a computer readable storage medium.  Claim 15 is rejected for the same reasons as claim 2.

As per Claim 16, Claim 16 is a model claim corresponding to method claim 3.  The difference is that it recites a computer readable storage medium.  Claim 16 is rejected for the same reasons as claim 3.

As per Claim 17, Claim 17 is a model claim corresponding to method claim 4.  The difference is that it recites a computer readable storage medium.  Claim 17 is rejected for the same reasons as claim 4.

As per Claim 18, Claim 18 is a model claim corresponding to method claim 5.  The difference is that it recites a computer readable storage medium.  Claim 18 is rejected for the same reasons as claim 5.

As per Claim 19, Claim 19 is a model claim corresponding to method claim 6.  The difference is that it recites a computer readable storage medium.  Claim 19 is rejected for the same reasons as claim 6.

As per Claim 20, Claim 20 is a model claim corresponding to method claim 7.  The difference is that it recites a computer readable storage medium.  Claim 20 is rejected for the same reasons as claim 7.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 
Wierstra et al. (US 2017/0230675 A1), in Fig. 1 and [0021-0023] discloses an Encoder neural network that produces a latent distribution from which latent variables are sampled.  Though not shown in Fig. 1, Wierstra subsequently discloses in [0052-0053] a process of passing the latent variables into a prior distribution, wherein the prior distribution is generated by a neural network, particularly in [0053]:  “The system maintains an instance of the generative neural network and processes the discrete values of the latent variables using the generative neural network to generate intermediate outputs that parameterize or define parameters of the generative prior distribution”
Wayne et. al. (US 10,872,299 B2), in Col 3 Lines 4-15, discloses:   “Determining a set of latent variables for a time step may comprise mapping, using a prior map, for example a prior generation neural network, a prior map input comprising the set of latent variables to, for each of the latent variables, parameters of a prior distribution over possible latent variable values for the latent variable. The latent variables may then be sampled from the prior distributions. The prior map input may be a combination of the set of latent variables and the updated hidden state of the controller neural network. The prior generation neural network may comprise one or more linear neural network layers to map the prior map input to the distribution parameters.”  In Fig. 1, Wayne’s latent state comes from “Controller recurrent neural network 106” and a value is selected from this based on Attention Network 130, which is ultimately fed into Prior generation network 114.  More details are given in Col 7 Lines 12-32.
Chung et. al. ("Recurrent Latent Variable Model for Sequential Data"), Page 3 Last Paragraph of Section 2.1, discloses:  “These approaches are closely related to the approach proposed in this paper. However, there is a major difference in how the prior distribution over the latent random variable is modelled. Unlike the aforementioned approaches, our approach makes the prior distribution of the latent random variable at timestep t dependent on all the preceding inputs via the RNN hidden state ht-1 (see Eq. (5)). The introduction of temporal structure into the prior distribution is expected to improve the representational power of the model, which we empirically observe in the experiments (See Table 1).”  Chung also discloses that the prior distribution parameters are from a neural network, as shown on Page 4 Section 3 Below Eq 6: 

    PNG
    media_image8.png
    33
    531
    media_image8.png
    Greyscale

Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to LEONARD A SIEGER whose telephone number is (571)272-9710. The examiner can normally be reached M-F 8:00 am - 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/L.A.S./Examiner, Art Unit 2126      
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126