DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Priority
Receipt is acknowledged of certified copies of papers required by 37 CFR 1.55.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/23/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Drawings
The drawings submitted on 06/23/2020 are deemed acceptable for examination.
Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 3-5 and 10-12 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 3-5 and 10-12 recite.  




    PNG
    media_image1.png
    135
    384
    media_image1.png
    Greyscale



    PNG
    media_image2.png
    131
    305
    media_image2.png
    Greyscale


    PNG
    media_image3.png
    47
    523
    media_image3.png
    Greyscale


each element of the claim must be defined, such as P, d, r, bp, bu, bo . 

Claim 4 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 4 and 11 recites the limitation "the set of parameters” in line 1, and “the learned parameters” in line 6.  There is insufficient antecedent basis for this limitation in the claim. To further prosecution these will be understood as "the plurality of parameters” and “ learned parameters”, respectively.
Claims 5 and 12 recites the limitation "the conditional probability distribution” in line 1.  There is insufficient antecedent basis for this limitation in the claim. To further prosecution claim 5 will be understood as " The processor implemented method of claim 3 [[4]] ”, claim 12 will be understood as "The processor implemented method of claim 10 [[11]] ”.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claim(s) 1, 6-8, and 13-15 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee et al. ("Armdn: Associative and recurrent mixture density networks for eretail demand forecasting." arXiv preprint arXiv:1803.03800 (2018), hereinafter Mukherjee, in view of Rose (Rose S. A Machine Learning Framework for Plan Payment Risk Adjustment. Health Serv Res. 2016 Dec;51(6):2358-2374. doi: 10.1111/1475-6773.12464. Epub 2016 Feb 19. PMID: 26891974; PMCID: PMC5134202), hereinafter Rose.

Regarding claim 1, Mukherjee teaches a processor implemented method for time-series prediction using a sparse recurrent mixture density networks (RMDN) model, the method comprising:
	iteratively predicting, via one or more hardware processors (p.7, col. 2 “4-core Intel Xeon CPU machine”), time series in a plurality of iterations (p.2, Fig. 1) using a data set comprising a plurality of high-dimensional time series (p. 4, col. 2 “The features under the “Price” and “Time” buckets are all continuous and are simply normalized to zero mean and unit variance. The rest of the feature buckets are composed of categorical and 1-Hot features which are embedded into a suitable feature space of size 30”), the plurality of high-dimensional time series comprising a first set of high- dimensional time series associated with a training data (p.6, col. 1 “We first trained a joint model, combining all SKUs across all verticals” and a second set of the high- dimensional time series associated with a validation data, each iteration of the plurality of iterations comprising:
	passing, through a feedforward layer of the sparse RMDN model, a high- dimensional time series from amongst the plurality of high-dimensional time series (p.4, Fig. 2), the sparse RMDN model comprising the feedforward layer (p. 4, Col. 1 “The Associative layer The primary motivation behind this layer was to treat the associative variables that drive demand”), a recurrent neural network (RNN) (p. 4 Col. 2 The Recurrent Layer “the output of the Associative layer is fed into a recurrent neural network (RNN)”) and a mixture density network (MDN) (p. 5, col. 2 “Since Gaussian mixtures is the natural representation for this kind of problems, we use a mixture density network (MDN) as an output layer”), the feedforward layer comprising a plurality of units associated with a plurality of distinct weights learnt by training the sparse RMDN model (p.4 , col. 2 “The final embedding was executed using a single FClayer with an Exponential Linear Unit (ELU) activation, which was shown to have better numerical stability than the popular ReLU”);
	performing, by the feedforward layer, dimensionality reduction of the high- dimensional time series to obtain a reduced dimensional time series, the feedforward layer comprising a number of the plurality of units equal to a fraction of the number of features in the set of features to perform the dimensionality reduction (Fig. 2 and p.4, col. 2 “the entire set of features are concatenated and embedded into a lower-dimensional space of 50”);
	feeding, through the RNN, the reduced dimensional time series to obtain latent representation of the high-dimensional time-series, the latent representation captures temporal patterns from the reduced dimensional time series (p. 4, Fig. 2 LTSM and p. 4, col. 2 “The model needs to capture both the short-term and longterm signals that are present in a time-series data. To achieve this, the output of the Associative layer is fed into a recurrent neural network (RNN)”);
	feeding the latent representation of the high-dimensional time series to a mixture of Gaussians comprising a plurality of Gaussian components to predict a plurality of parameters associated with the plurality of Gaussian components in the mixture, the plurality of parameters comprising a plurality of probability values, a plurality of mean values and a plurality of standard deviation values associated with the plurality of the Gaussian components (p. 4, Fig. 3 and p.5, col. 2 “Since Gaussian mixtures is the natural representation for this kind of problems, we use a mixture density network (MDN) as an output layer [21, 39]. The MDN hypothesis is intuitively robust considering the many external factors that are not accounted for in the model. Therefore, considering K Gaussian mixtures, the conditional distribution can be equated as

    PNG
    media_image4.png
    54
    543
    media_image4.png
    Greyscale

where pk, µk and σk are the probability, mean and standard deviation of the kth Gaussian component respectively”);
	selecting a Gaussian component from amongst the plurality of Gaussian components that is associated with a highest value of probability from amongst the plurality of probability values (p. 5 
    PNG
    media_image5.png
    79
    416
    media_image5.png
    Greyscale
, p. 6, Col 1.”The conditional distribution of the network is therefore trying to model is a function of the AR-MDNoutput hi,t, given in Equation 3, and in likelihood terms can be expressed as 
    PNG
    media_image6.png
    112
    561
    media_image6.png
    Greyscale
”), wherein the mean of the selected Gaussian component is selected as prediction of the time-series and the standard deviation of the Gaussian component is selected for confidence estimation of the prediction for the iteration (p. 5, col. 2 “
    PNG
    media_image7.png
    120
    479
    media_image7.png
    Greyscale
”, and p. 6, Col 1.”The conditional distribution of the network is therefore trying to model is a function of the AR-MDNoutput hi,t, given in Equation 3, and in likelihood terms can be expressed as 
    PNG
    media_image6.png
    112
    561
    media_image6.png
    Greyscale
”);
	computing a value of a loss function indicative of error in the prediction of the time-series using the plurality of parameters, the loss function being one of a training loss function (p. 6, col. 2 “Loss function. Since the last layer outputs a probability distribution over the demand values, we can resort to the maximum likelihood principle for training the model parameters. Given the parameter definition in the output layer, the loss function of the model is then given as 
    PNG
    media_image8.png
    78
    536
    media_image8.png
    Greyscale
”) and a validation loss function; and
	updating, via the one or more hardware processors, a plurality of weights of the sparse RMDN model using the value of the training loss function after each iteration of the plurality of iterations associated with the training data for prediction of the time-series (p. 6. col. 2 “We evaluate this loss for each product’s national level sales at each point in the training week. Training is done using standard stochastic gradient steps”).

	does not teach the sparse RMDN model being trained by imposing Lasso penalty on the plurality of weights of the feedforward layer to determine a set of features associated with the time series in an unsupervised manner;

	Rose teaches does not teach the sparse RMDN model being trained by imposing Lasso penalty on the plurality of weights of the feedforward layer to determine a set of features associated with the time series in an unsupervised manner (p. 2362 , ¶3 “Therefore, penalized regressions offer an alternative bias‐variance tradeoff. The lasso penalty, which stands for least absolute shrinkage and selection operator, offers, as the name suggests, simultaneous shrinkage of the coefficients toward zero as well as variable selection. This is because the penalty shrinks many coefficients to zero, thus eliminating those variables as contributing to the predicted values of Y”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have used the lasso penalty for weighting as taught by Rose in the feedforward/associative layer of Mukherjee. One of ordinary skill in the art would have been motivated “to develop the best predictor (i.e., the prediction function with the optimal bias-variance tradeoff)” (Rose p. 2363, ¶1)

Regarding claim 6, Mukherjee in view of Rose teaches the processor implemented method of claim 1.
	Rose further teaches wherein the Lasso penalty comprises imposing sparsity on the plurality of weights of the feedforward layer by restricting a fraction of the weights to be close to zero to result in unsupervised feature selection (The lasso penalty, which stands for least absolute shrinkage and selection operator, offers, as the name suggests, simultaneous shrinkage of the coefficients toward zero as well as variable selection. This is because the penalty shrinks many coefficients to zero, thus eliminating those variables as contributing to the predicted values of Y).

Regarding claim 7, Mukherjee in view of Rose teaches the processor implemented method of claim 1.
	Mukherjee further teaches wherein the sparse RMDN model comprises one of a sparse LSTM model and a sparse ED model, wherein the sparse LSTM model comprises the feedforward layer with [[Lasso]] sparsity constraints on the plurality of distinct weights and a LSTM as the RNN (p. 4, Fig. 2 “Figure 2: The architecture of the Multi-Layer Perceptron network that models the Associative Layer. The causal or associative features are grouped into five buckets as described in Table 1. Embeddings are learned to represent the categorical and 1-Hot features. A fully-connected layer is used to compress the concatenated embeddings into a dense 50dimensional space”. As shown in claim 1, Rose teaches using the lasso penalty for the sparsity constraints on the weights of the model of Mukherjee), and
	wherein the sparse ED comprises the feedforward layer with LASSO sparsity constraints on the plurality of distinct weights and an encoder decoder as the RNN.

Claim 8, 13 and 14 are a system including one or more memories; and one or more first hardware processors, the one or more first memories coupled to the one or more first hardware processors (taught by Mukherjee p. 7, Col. 2 “Titan X GPU with 3072 cores and 12 GB RAM”), for performing the limitations of claims 1, 6 and 7, respectively. The limitations are substantially the same and therefore rejected for the same reasons.

Claim 15 is one or more non-transitory machine readable information storage mediums for performing the limitations of claim 1. The limitations are substantially the same, therefore rejected for the same reasons.

Claim(s) 2 and 9 is/are rejected under 35 U.S.C. 103 as being unpatentable over Mukherjee, in view of Rose and Kashiparekh et al. (Kashiparekh, Kathan, et al. "Convtimenet: A pre-trained deep convolutional neural network for time series classification." 2019 International Joint Conference on Neural Networks (IJCNN). IEEE, 2019.), hereinafter Kashiparekh.

Regarding claim 2, Mukherjee in view of Rose teaches the processor implemented method of claim 1.
	Mukherjee in view of Rose does no teach the specifics of further comprising validating the prediction by the RMDN model, wherein validating comprises:
iteratively predicting the time series in a second plurality of iterations from amongst the plurality of iterations using the validation data set;
computing, based on the predicted time series, the validation loss function indicative of error in validation; and
selecting an iteration from amongst the second plurality of iterations for time series prediction based on the validation loss function value.

	teaches validating the prediction by the RMDN model, wherein validating comprises:
	iteratively predicting the time series in a second plurality of iterations from amongst the plurality of iterations using the validation data set (p. 4, col. 2 “More specifically, for obtaining the best parameters WCTN during the iterative training process (refer Algorithm 1), we use a (relatively smaller) validation set S∗ containing V UTSC datasets such that S∗ ∩ S = ∅.”;
	computing, based on the predicted time series, the validation loss function indicative of error in validation (P. 4, COL. 2 “Using updated WCT N,i and Wc k at the epoch with minimum validation loss… the validation loss for CTN at the end of i-th training epoch is defined as the average of these test losses across all datasets in S ∗”) and
	selecting an iteration from amongst the second plurality of iterations for time series prediction based on the validation loss function value (P. 4, COL. 2 “The optimal parameters WCT N are chosen at the epoch where the validation loss L v i is minimum, and represent the final parameters of the CTN”).

	It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have used validation and loss function as taught by Rose in the feedforward/associative layer of Mukherjee. One of ordinary skill in the art would have been motivated as it “yields a CTN model that is likely to generalize to unseen tasks” (Kashiparekh p. 4, ¶2).

Claim 9 is a system including one or more memories; and one or more first hardware processors, the one or more first memories coupled to the one or more first hardware processors (taught by Mukherjee p. 7, Col. 2 “Titan X GPU with 3072 cores and 12 GB RAM”), for performing the limitations of claim2. The limitations are substantially the same and therefore rejected for the same reasons.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to SHON G FOLEY whose telephone number is (469)295-9092. The examiner can normally be reached 10AM-6PM CT M-F.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, James Lee can be reached on (571) 270-5965. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/SHON G FOLEY/Examiner, Art Unit 3668