DETAILED ACTION
This action is responsive to the Application filed on 09/25/2018. Claims 1-22 are pending in the case.  Claims 1 and 12 are independent claims. Claims 1-2, 4-8, 12-13, 15-19 are amended. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 02/01/2022 has been entered.

Response to Arguments
Applicant’s arguments and amendments filed 06/03/2022 pertaining to the 35 U.S.C. 112b rejection have been fully considered. The rejection has been withdrawn accordingly. 
Applicant's arguments filed 06/03/2022 with respect to the 35 U.S.C. 103 rejection have been fully considered but they are not persuasive. 	A) Applicant states Claim 1 recites “a search algorithm for selecting said set of statistical features”. New reference Bergstra does not automate feature selection. Bergstra (Title) has “model selection and hyperparameter optimization” that is not feature selection. The Office action (pages 8-9) says “Bergstra... The first four preprocessing algorithms were for dense features.” Here, “were for” does not mean feature selection. Bergstra is mischaracterized.
Examiner disagrees. As pointed out in the rejection hyperopt is a system which optimizes the machine learning system hyper parameters which includes which statistical features to use. Further, as part of hyperopt’s configuration a search algorithm is selected to traverse the solution space of the optimization problem. (Bergstra pg 5 “Assigning the algo keyword argument hyperopt.fmin is recommended way to choose a search algorithm”). It is believed that the quote referenced by the applicant is in reference to the types of preprocessing algorithms (hyperparameters) that “were” selected from in the exemplary embodiment. 
B) Applicant states “The Office action (pages 8-9) says “Bergstra... The StandardScaler, MinMaxScaler, and Normalizer did various feature-wise affine transforms to map numeric input features onto values near 0 and with roughly unit variance.” Normalizing/scaling is not feature selection. Bergstra is mischaracterized.”
As stated previously, Bergstra suggests feature selection is performed by the optimization system (hyperopt). In the rejection, the examiner explicitly states “hyperopt…selects[s] and optimal preprocessing module”.  “Normalizing/scaling” is not mapped to feature selection, this is a statistical feature itself.
C) Applicant states “The Office action (pages 8-9) says “Bergstra...PCA performed whitening or non-whitening PCO.” Whitening and non-whitening is normalizing/scaling, which is not feature selection. Bergstra is mischaracterized.”
Again, “Whitening” is not characterized as feature selection. These are statistical features. Hyperopt performs feature selection.
D) Applicant states “The Office action (pages 8-9) says “Bergstra... the choice of preprocessing module”. Module selection is not feature selection. Bergstra is mischaracterized.”
Examiner disagrees. With reference to Bergstra, the “preprocessing module” is a module which when used by the system after being selected generates features. These features are different dependent on the module selected. Therefore selecting a preprocessing module amounts to feature selection. This again is noted in the rejection by the examiner.
Finally with respect to claim 9 applicant states “The Office action (page 14) alleges “Bergstra teaches, receiving an interactive selection of a type of said machine learning model. ...pg 5 Assigning the algo keyword argument to hyperopt.fmin is recommended way to choose a search algorithm”. A search algorithm is not a machine learning model. New reference Bergstra is mischaracterized.” 
Examiner encourages applicant to consider all evidence presented by the examiner for the rejection of the claim. While the user selects the “search algorithm” when setting up the hyperopt search space, the rejection points out that not only does the user select a “search algorithm” they also specify the “classification modules” to be searched over. Classification modules in the context of Bergsta are machine learning models.


Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1,2 5-7, 9-11, 12, 13, 16-18, 20-22 are rejected under 35 U.S.C. § 103 as being unpatentable over Ryan et al. Us Document ID US 20210089927 A9 hereinafter Ryan, further in view of Chujai et al. “Time Series Analysis of Household Electric Consumption with ARIMA and ARMA Models” hereinafter Chujai, further in view of Bergstra et al “Hyperopt: a Python library for model selection and hyperparameter optimization” hereinafter Bergstra.

Regarding Claim 1 and 12
	
	Ryan teaches, A method, comprising: receiving original time-series data for training a machine learning model; (¶0013 “The method includes obtaining data in a time-series…The method also includes training a Deep Neural Network (DNN)”) generating augmented sets of time-series data… from the original time-series data (¶0013 “creating one-dimensional or multi-dimensional windows [augmented sets of time-series data] from the time-series data [original time series]”) each augmented set of time-series data of said augmented sets time-series including a respective set of statistical features that are each calculated using a window based statistical function, (¶0090-0096 “Historical data is windowed and windows are associated with labels…For each window in the sequence of windows (T−3, T−2, T−1, T, T+1, T+2), a figure of merit is found (i.e., the probability [statistical function] that an anomaly or other significant pattern[feature] is present in that window)” Examiner notes that for each windowed time set the probability for the existence of a feature is calculated. Examiner further notes that the step of windowing data amounts to a calculation on data that indicates a statistical feature present in the data. The set of all possible patterns of anomalies corresponds to a respective set of features. Furthermore, a pattern/feature/anomaly are considered synonymous in the context of Ryan. ) each statistical feature of said respective set of statistical features having a window size, (¶0104 “search for the pattern using a number of window sizes W for each of the time slots T. The window size W with the highest conditional probability at time T is the best window size for the anomaly” For each feature or anomaly there is a corresponding window size that gives the best probability in a given time slot. ¶0133 “In order to capture the seasonality correlations, sizes of slides are chosen equal to human behavior activities. For instance, the window sizes could include one day worth of samples, one week worth of samples, one month worth of samples, or samples over any other suitable time period corresponding to the cycles of the signal” Further different windows sizes describe different features, including days, months, and years.) generating a respective trained machine learning model by at least providing said each augmented set of time-series data as input to a machine learning model for training the machine learning model based on said each augmented set of time-series data (¶0090 “Machine learning algorithms are trained with windows [ each set of time series data] as exemplars and labels as what the output could be.”  generating a trained model via training corresponds to trained algorithms. Training with the windows corresponds to providing the augmented set of time-series data as input to the model. Further this also includes using the output of the model as input during back propagation ) for each augmented set of time-series data of augmented said sets of time-series data: generating a respective prediction accuracy score for said respective trained machine learning model; (¶0126 “The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification … highest true positive rate for a given maximum false positive rate [prediction accuracy score]” in order to select a best model, a prediction accuracy score must have been determined to inform the KPI.) and selecting a set of statistical features associated with an augmented set of time-series data of said augmented sets of time series data based on the respective prediction accuracy score generated for each augmented set of time series data of said augmented sets of time series data. (¶0126 “The selection may be performed during the validation stage of the training. Finally, anomalies are detected (block 198) using the best model.” selecting a set of features corresponds to detecting anomalies. Further, because the anomalies are detected based on the model that takes sets of time-series data as input, the features detecting are associated with that data and the model’s prediction accuracy generated during training.) One or more non-transitory computer-readable storage media storing sequences of instructions which, when executed by one or more processors, cause: (¶0014 “a non-transitory computer-readable medium configured to store a program executable by a processing system is provided. The program includes instructions to cause the processing system to obtain time-series data and create one-dimensional windows from the time-series data”)
	Ryan does not explicitly teach, the window size of each statistical feature of said respective set of statistical features being different than the window size of each other statistical feature of said respective set of statistical features; receiving one or more selected from the group consisting of: a selection of a search algorithm for selecting said set of statistical features, a selection of said window based statistical function to evaluate, a selection of a type of said machine learning model, receiving a specification for ranges defining a search space of window sizes, and receiving a maximum number of window sizes to evaluate; generating augmented sets of time-series data based on the one or more search selection
Chujai when addressing issues related to features each having different sized windows for use in time series analysis teaches, the window size of each statistical feature of said respective set of statistical features being different than the window size of each other statistical feature of said respective set of statistical features; (Pg 3 ¶02 “We used the ts() function in R library for construction of a time series. This function must be specifying a frequency of time series. This paper used a frequency of 365, 53, 12, and 4 to indicate that a time series is composed of daily series, weekly series, monthly series, and quarterly series, respectively” the time series is decomposed into a set of features depicting differently sampled windows (daily, weekly, monthly and quarterly). Each feature is a different length, therefore having a different size window than each other feature (for example, 52 samples for the weekly series vs 4 samples for the quarterly series.)
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate features each of different window size to be used by forecasting models as taught by Chujai to the disclosed invention of Ryan.
One of ordinary skill in the arts would have been motivated to make this modification in order to determine “The suitable forecasting methods and the most suitable forecasting period [or feature]” for producing the most accurate forecast for time series data. (Chujai Abstract)
Ryan/Chujai does not explicitly teach, receiving one or more selected from the group consisting of: a selection of a search algorithm for selecting said set of statistical features, a selection of said window based statistical function to evaluate, a selection of a type of said machine learning model, receiving a specification for ranges defining a search space of window sizes, and receiving a maximum number of window sizes to evaluate; generating augmented sets of time-series data based on the one or more search selection
Bergstra however when addressing a system for selecting the optimal hyperparameters for a machine learning problem teaches, receiving one or more selected from the group consisting of: a selection of a search algorithm for selecting said set of statistical features, a selection of said window based statistical function to evaluate, a selection of a type of said machine learning model, receiving a specification for ranges defining a search space of window sizes, and receiving a maximum number of window sizes to evaluate; generating augmented sets of time-series data based on the one or more search selection (pg 1 “we take the view that the choice of classifier and even the choice of preprocessing module can be taken together to represent a single large hyperparameter optimization problem” pg 4 “Hyperopt shoulders the responsibility of finding the best value of a scalar-valued, possibly stochastic function over a set of possible arguments to that function…Hyperopt encourages you, the user, to describe your configuration space in more detail” pg 5 “Assigning the algo keyword argument to hyperopt.fmin is recommended way to choose a search algorithm…Currently supported search algorithms are random search (hyperopt. rand.suggest), annealing (hyperopt.anneal.suggest), and TPE (hyperopt. tpe.suggest)” pg 17 “Hyperopt-Sklearn provides a parameterization of a search space over pipelines, that is, of sequences of preprocessing steps and classifiers…. The first four preprocessing algorithms were for dense features. PCA performed whitening or non-whitening PCO. The StandardScaler, MinMaxScaler, and Normalizer did various feature-wise affine transforms to map numeric input features onto values near 0 and with roughly unit variance. “ hyperopt always a user to select a search algorithm from several options in order to select an optimal preprocessing module. The preprocessing module generates statistical features.)
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a method for selecting the best preprocessing algorithm for data input into a machine learning model using an optimization routine  as taught by Bergstra to the disclosed invention of Ryan/Chujai.
One of ordinary skill in the arts would have been motivated to make this modification because Ryan/Chujai present a method for selecting an optimal machine learning model and Bergstra improves upon model selection by implementing optimal preprocessing steps in tandem. Bergstra states “Following Auto-Weka, we take the view that the choice of classifier and even the choice of preprocessing module can be taken together to represent a single large hyperparameter optimization problem” (abstract Bergstra)

Regarding Claim 2 and 13
	Ryan/Chujai/Bergstra teaches Claim 1 and 12
	Further Ryan teaches, performing gradient descent computation on the respective prediction accuracy score generated for each augmented set of time-series data of said augmented sets of time-series data. (¶0141 “FIG. 16 is a diagram showing generic set-ups of meta-learning systems, which may include an automatic model selection system 230 and a gradient-based [gradient descent computation] hyper-parameter optimization system 232” the hyper parameter optimization includes choosing the optimal window size for detection of features, based on the prediction accuracy. ¶0104 “so in fact the classifier is trained with multiple window sizes W on the training data set and the windowing procedure T is used on the testing set to select the best W by picking the combined classifier and window size” selecting the set of features is dependent on the probability score computed by the window function. Thus, the process of selecting a set of features depends on the gradient descent during training.)

Regarding Claim 5 and 16
Ryan/Chujai/Bergstra teaches Claim 1 and 12
	Further Ryan teaches, statistical selecting a respective trained machine learning model that yields a best prediction accuracy score as a selected trained machine learning model for making predictions and identifying anomalies given original time-series data; (¶0126 “The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification…selecting the model in this way is in fact searched over a hyper-parameter space of models and results in the “optimal” model for the machine learning task at hand. The selection may be performed during the validation stage of the training. Finally, anomalies are detected (block 198) using the best model.”) and selecting a particular window size value of a window size based on the selected trained machine learning model. (¶0104 “A procedure [selecting a particular window size] can be devised on top of this procedure [pattern detection training procedure] to search for the optimum window size as well. That procedure will repeat the search for the pattern using a number of window sizes W for each of the time slots T. The window size W with the highest conditional probability at time T is the best window size for the anomaly. This procedure is used during the training of the classifier”)

Regarding Claim 6 and 17
	Ryan/Chujai/Bergstra teaches Claim 1 and 12
	Further Ryan teaches, selecting a respective trained machine learning model that yields a best prediction accuracy score as a selected trained machine learning model for making predictions and identifying anomalies given original time-series data; (¶0126 “selecting the model in this way is in fact searched over a hyper-parameter space of models and results in the “optimal” model for the machine learning task at hand. The selection may be performed during the validation stage of the training. Finally, anomalies are detected (block 198) using the best model.”) and selecting a number of statistical features that are each calculated using a window based statistical function. (¶0133 “sizes of slides are chosen equal to human behavior activities. For instance, the window sizes could include one day worth of samples, one week worth of samples, one month worth of samples, or samples over any other suitable time period corresponding to the cycles of the signal.” the sampling scheme truncates the data to a particular length. Truncation is a statistical function that censors trends of undesirable length.)


Regarding Claim 9 and 20
Ryan/Chujai/Bergstra teaches Claim 1 and 12
Further Bergstra teaches, receiving an interactive selection of a type of said machine learning model. (pg 4 “Hyperopt encourages you, the user, to describe your configuration space in more detail” pg 5 “Assigning the algo keyword argument to hyperopt.fmin is recommended way to choose a search algorithm” the user selects the search algorithm pg 16 “we introduce Hyperopt-Sklearn: a project that brings the benefits of automatic algorithm configuration to users of Python and Scikit-learn. Hyperopt-Sklearn uses Hyperopt to describe a search space over possible configurations of Scikit-learn components, including preprocessing and classification modules” users define the possible classification machine learning model to be selected from.)

Regarding Claim 10 and 21 
Ryan/Chujai/Bergstra teaches Claim 1 and 12
Bergstra teaches, wherein said search algorithm comprises one selected from the group consisting of grid search algorithm; random search algorithm; and gradient descent algorithm (pg 5 “Assigning the algo keyword argument to hyperopt.fmin is recommended way to choose a search algorithm. Currently supported search algorithms are random search” the user selects a search algorithm from a group including random search)

Regarding Claim 11 and 22 
Ryan/Chujai/Bergstra teaches Claim 1 and 14
Further, Ryan teaches, receiving a selection of a type of said machine learning model wherein said type of said machine learning model comprises one or more selected from the group consisting of: random forest model;  autoencoder model; multilayer perceptron model; and recurrent neural networks; long short-term memory model (¶0174 “FIG. 31 is a table 450 showing the test results of utilizing various algorithms described in the present disclosure… Multi-Layer Percetron (MLP), Long Short-Term Memory (LSTM)” ¶0014 “the program causes the processing system to determine an algorithm among the one or more machine learning algorithms with the best performance”)


Claim 3 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Ryan/Chujai/Bergstra. Further in view of Achin et al. US 20180046926 A1 hereinafter Achin.

Regarding Claim 3 and 14
Ryan/Chujai/Bergstra teaches Claim 2 and 13
However Ryan/Chujai/Bergstra does not explicitly teach, receiving a specification for a range to search within a search space, thereby defining a specified search space; generating time series data within the specified search space for training machine learning models
However Achin, when addressing issues related to defining a search range for a time-series forecasting problem teaches, receiving a specification for a range to search within a search space, thereby defining a specified search space; generating time series data within the specified search space for training machine learning models. (¶0015 and ¶0341 “generating training data from the time-series data…wherein the skip range separates an end of the training-input time range from a beginning of the training-output time range… The user may indicate a “skip range” in the data, which is a gap between the end of a training window (e.g., a time range of data used for training) and the start of a validation window” the specification is received from the user, The skip range further specifies the range for the machine learning model to be trained.)
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a method for refining the search space by defining a skip range to skip non-important data for an optimization problem for a machine learning model as taught by Achin to the disclosed invention of Ryan/Chujai/Bergstra.
One of ordinary skill in the arts would have been motivated to make this modification in order to implement a skip range because “Techniques are needed for rigorously and efficiently exploring the modeling search space for time-series models…. rigorous and efficient exploration of the time-series modeling search space (including efficient training, testing, and comparison of time-series models) can be facilitated by explicitly parametrizing certain aspects of time-series modeling procedures [including defining a skip range]” (Achin ¶0014)

Claim 4 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ryan/Chujai/Bergstra, further in view of Abbaszadeh et al. US 20180191758 A1 hereinafter Abbaszadeh.

Regarding Claim 4 and 15 
Ryan/Chujai/Bergstra teaches Claim 1 and 12
However Ryan/Chujai/Bergstra does not explicitly teach, wherein using a window based statistical function includes moving gradient.
However Abbaszadeh, when addressing feature engineering for processing data for neural network teaches, wherein using a window based statistical function includes and moving gradient. ( ¶0036 Note that many different types of features may be utilized in accordance with any of the embodiments described herein, including… Embodiments may also be associated with time series analysis features, such as derivatives and integrals of signals” One example of a statistical feature which may be used for preprocessing of neural network inputs is a “derivative” corresponding to a gradient. (¶0029 “ At S210, a plurality of real-time monitoring node signal inputs may receive streams of monitoring node signal values over time that represent a current operation” the signals continuously received in real time are changing in real time. The derivative extracted from a changing signal is a moving derivative. The function is window based because it is based on a signal limited by the signal values that represent the “current operation of an industrial asset” thus a windowed signal.
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate certain features of a raw machine learning input signal that are predictive of a specific output such as the derivative of the signal as taught by Abbaszadeh to the disclosed invention of Ryan/Chujai/Bergstra/Abbaszadeh.
One of ordinary skill in the arts would have been motivated to make this modification because both Ryan/Chujai/Bergstra and Abbaszadeh discussed using certain features of an input signal extracted using preprocessing steps for use in a machine learning model. Bergstra  notes the desirability of other preprocessing steps to include in the parameter optimization space (“Hyperopt-Sklearn provides many opportunities for future work: more classifiers and preprocessing modules could be included in the search space… Other types of data require different preprocessing, and other prediction problems exist” Bergstra pg 21).

Claims 7- 8 and 18-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ryan/Chujai/Bergstra. further in view of Limonad et al US Document ID US 20170193395 A1, hereinafter Limonad, further in view of Talagala et al. “Meta-learning how to forecast time series” hereinafter Talagala.

Regarding Claim 7 and 18
	Ryan/Chujai/Bergstra teaches Claim 6 and 17
	Further Ryan teaches, receiving input time-series data for performing predictions and identifying anomalies; (¶0013 “The method includes obtaining data in a time-series…The method also includes training a Deep Neural Network (DNN)”) receiving one or more values of recommended parameters for the selected trained machine learning model, (¶0126 and Figure 11 and Figure 12A-D“The best model determines the best data transformation, or best combination of data transformations… The selection may be performed during the validation stage of the training.” a single best data transformation or a combination of data transformations may be the best or recommended final parameters for a model, this configuration is selected or received by the system. ¶0125 “Data transformation includes converting obtained time-series data into a time-series more appropriate for a machine learning algorithm” these transformations are window based transformations because they operate on a discrete range of data as shown in figure 12A-D) based on the one or more values of the recommended parameters, automatically generating a particular augmented time- series data from the input time-series data; providing the particular augmented time-series data to the selected trained machine learning model; ((¶0126 and Figure 11 and Figure 12A-D“The best model determines the best data transformation, or best combination of data transformations” ¶0013 “creating one-dimensional or multi-dimensional windows from the time-series data” the “best transformations” are the recommended parameters for generating an augmented time-series from the input time series data. In Figure 9, the windowed measurements correspond to the transformation or augmentations represented in the 136 blocks. They are used on the selected model in the validation stage 140.) and receiving, from the selected trained machine learning model, predictions for future time-series data as well as identified anomalies in the input time-series data. (¶0126 “The best model determines the best data transformation, or best combination of data transformations. The best model is selected based on a key performance indicator (KPI) relevant to how the model is going to be used for prediction/classification (e.g. smallest false positive rate, smallest prediction latency, highest true positive rate for a given maximum false positive rate, etc.)” the best model is selected based on the prediction accuracy. Outputting a prediction accuracy from a model is equivalent to receiving a prediction accuracy from a selected best model.) 
	Ryan does not explicitly teach, wherein the one or more values [of the recommended parameters] specify a number of windows.
	Limonad however when addressing optimizing configuration parameters of a machine learning model teaches, wherein the one or more values [of the recommended parameters] specify a number of windows. (¶0023 “Yet another technical solution is to perform optimization over one or more parameters of a multi-stage event detection procedure to bring percent-based true positive rate (TPR) to a maximum… for example, segmentation parameters such as number of consecutive sliding windows” the optimization procedure optimizes for the number of windows for segmentation. In the art segmentation refers to the preprocessing of input data before being passed to the machine learning model as demonstrated in Figure 1.)
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate an optimization procedure for selecting the optimal configuration parameters that produce the maximum true positive rate, the configuration parameters including the number of windows as taught by Limonad to the disclosed invention of Ryan/Chujai/Bergstra.
One of ordinary skill in the arts would have been motivated to make this modification because both Ryan/Chujai/Bergstra and Limonad discuss producing a configuration for a machine learning model that is optimal. Bergstra  notes the desirability of other preprocessing steps to include in the parameter optimization space (“Hyperopt-Sklearn provides many opportunities for future work: more classifiers and preprocessing modules could be included in the search space… Other types of data require different preprocessing, and other prediction problems exist” Bergstra pg 21).

Regarding Claim 8 and 19
Ryan/Chujai/Bergstra/Limonad teaches Claim 7 and 18
However Ryan/Chujai/Bergstra/Limonad does not explicitly teach, wherein automatically generating the particular augmented time-series data from the input time-series data comprises: automatically generating a particular set of one or more statistical features according to the one or more values of the recommended parameters; and automatically concatenating the particular set of one or more statistical features to the input time-series data to generate the particular augmented time-series data.
However Talagala, when addressing augmenting time series data to provide additional information for time series forecasting, wherein automatically generating the particular augmented time-series data from the input time-series data comprises: automatically generating a particular set of one or more statistical features according to the one or more values of the recommended parameters; and automatically concatenating the particular set of one or more statistical features to the input time-series data to generate the particular augmented time-series data. (Section 3.1 ¶01 “we may wish to augment the set of observed time series by simulating new time series similar to those in the assumed population…In order to produce simulated series that are similar to those in the population, we consider two classes of data generating processes: exponential smoothing models and ARIMA models” Algorithm 1 
    PNG
    media_image1.png
    118
    940
    media_image1.png
    Greyscale
 the windowed time series O, the result of windowing based on recommended parameters with length n, generates a particular set of simulated time series that is a deterministic representation of statistical features inherent in the data. In step 3 of the algorithm 1, an augmented time-series data, reference set R, is created by concatenating O and the simulated data.) 
It would have been obvious for one or ordinary skill in the arts before the effective filling date of the claimed invention to incorporate a augmented time series for forecasting, that is generated from the windowed data and a statistical representation of the windowed data to be used in the forecasting model as taught by Talagala to the disclosed invention of Ryan/Chujai/Bergstra/Limonad.
One of ordinary skill in the arts would have been motivated to make this modification in order remedy “The random forest (RF) algorithm [that] is highly sensitive to class imbalance” because “some classes contain significantly more cases than other classes. The degree of class imbalance is reduced to some extent by augmenting the observed sample with the simulated time series.” (Section 4.3 ¶01 Talagala)

Conclusion
Prior art
US document ID US 20030200134 A1 an automated system that selects an optimal forecasting model from a pool of models.
Andrey Ignatov “Real-time human activity recognition from accelerometer data using Convolutional Neural Networks” discusses appending statistical features to extracted convolutional features to be input together in a fully connected neural network.

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to JOHNATHAN R GERMICK whose telephone number is (571)272-8363. The examiner can normally be reached M-F 7:30-4:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.
	
/J.R.G./Examiner, Art Unit 2122                                                                                                                                                                                                        

/ERIC NILSSON/Primary Examiner, Art Unit 2122