Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Regarding the 35 USC 103 rejection, Examiner has fully considered Applicant’s arguments and amendments.
Regarding Applicant’s assertion of “"Within a single learning step, the machine learning apparatus 100 constructs a model by using training data and evaluates its prediction performance by using test data." Ura at ¶ [0089]. However, Ura is clear that the training data plus test data used for each of the different models "uses all of the plurality of data points of the pooled training dataset." Specifically, Ura discloses that "First, the machine learning apparatus 100 divides sampled data into M blocks, where M is an integer greater than one. M-1 blocks are used as training data, and the remaining one block is used as test data." Ura at ¶ [0090]. Therefore, for multiple models where M must be at least 3, a particular set of training data and test data cannot equal the entirety of the sampled data.,” Examiner respectfully disagrees. Examiner cited to paragraph [0091] of Ura, which states: “As another example, the machine learning apparatus 100 may perform a random-sampling validation method as follows.”  The example of [0091] is not the same as [0090], which provides a different example of how the datasets are constructed. Therefore, Examiner is not relying on [0089-0090] of Ura, which Applicant is providing assertions against. As Applicant’s arguments are in view of a different example/embodiment of the Ura reference, the arguments are moot. As can be seen in [0091] of Ura, the reference discloses the sampled data is sampled without replacement including each set of training data does not include duplicates of the same unit dataset, nor does the test data set, wherein each single sampling never enters the same unit dataset to both the training data and test data. 
Regarding Applicant’s assertion of “Similarly, Ura discloses: "The machine learning apparatus 100 randomly samples training data and test data from a given population of data. Then the machine learning apparatus 100 learns a model by using training data and calculates prediction performance of the model by using test data." Ura at ¶ [0091]. "Here the above-noted sampling operation samples data 'without replacement.' That is, each sampled set of training data does not include duplicates of the same unit dataset, and the same is true for each sampled set of test data." Ura at ¶ [0091]. Because Ura samples "without replacement", each set of training data and test data by definition cannot include ALL of the data points of the sampled data because each set of training data has DIFFERENT data from each other.,” Examiner respectfully disagrees. The present claim requires randomly sampling data to form a plurality of different training sets and different validation sets, wherein each combination of a training set and validation set forms all the plurality of data points. The Ura reference discloses randomly sampling data without replacement in order to produce training data sets and test data sets. The Ura reference further discloses the random sampling as follows in [0091]: “That is, each sampled set of training data does not include duplicates of the same unit dataset, and the same is true for each sampled set of test data. Also, each single sampling never enters the same unit dataset to both the training data and test data.” Furthermore, regarding Applicant’s assertion of “each set of training data and test data by definition cannot include ALL of the data points of the sampled data because each set of training data has DIFFERENT data from each other,” Examiner respectfully disagrees. Ura teaches in [0091] that the sampling to produce the datasets are “allowed to select the same unit dataset multiple times.” As Ura performs sampling without replacement, each training set has data that is exclusive to the training set and is not a part of the corresponding validation set. The Ura reference is capable of creating training and test data sets that never enter the same unit dataset to both the training and test data set, and are allowed to select datasets multiple times. Therefore, as can be seen by the capabilities of the random sampling of Ura, the reference is capable of producing datasets where each combination of a training set and a validation set forms all of the plurality of data points.
Applicant is arguing in view of the species of random sampling disclosed in the specification. Examiner is not bound by said interpretation because the present claims merely recite “randomly sampling,” which is the genus of the asserted species. Applicant is reminded that it is impermissible to import limitations from the specification into the claims. (See MPEP 2111.01(II)).  If Applicant wishes for Examiner to interpret the claims as though they require sampling with replacement, then the claims need to positively recite this limitation. 
Accordingly, the 35 USC 103 rejection is maintained.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 9-12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al (US 20170372229 A1) in view of He et al. (US 20170236136 A1).

Regarding claim 1, Beddo teaches a method of forecasting sales of a retail item (Figs. 2 and 4), the method comprising:
receiving historical sales data of a class of a retail item that comprises a plurality of stock keeping units (paragraph [0064] teaches collecting historical product information associated with the sale of products of a specific category (i.e. class); see also: [0002, 0024, 0063, 0069]), 
the historical sales data comprising past sales and promotions of the retail item across a plurality of past time periods (paragraphs [0065] teaches collecting product information including sales data measured in any time period, such as weeks, days, months, etc., wherein paragraph [0067] the product information includes promotions associated with the product; see also: [0024, 0063, 0069]), 
the historical sales data corresponding to an amount of sales of each stock keeping unit at each store during each of the past time periods (paragraph [0064] teaches the product information relates to the sale of a products of a category; paragraph [0063] teaches the sales product information includes the sales of the product by an individual store, retailer as a whole, or geographic area; paragraph [0065] teaches the product sales data is the number of units sold during a time period, such as hours, weeks, days, months, and years); 
aggregating the historical sales to a higher aggregation level than the historical sales data to form a pooled training dataset having a plurality of data points (paragraph [0074] teaches constructing a data matrix of product information used by the neural network, wherein paragraph [0114] the product information is sales data of weekly sales over a year at all of the retailer’s distribution outlets in a specific geographic retail area (i.e. a higher aggregation level)), 
the higher level comprising an amount of sales of each subclass that corresponds to the stock keeping units at each store during each of the past time periods (paragraph [0074] teachings constructing a data matrix of product information used by the neural network, wherein paragraph [0114] the product information is sales data of weekly sales over a year at all of the retailer’s distribution outlets in a specific geographic retail area of 24-packs of consumer product Z (i.e. subclass)); 
each data point representing a subclass/store combination ([0063-0064] teach the production information includes the product information is associated with the sale of a product by a retailer, such as a grocery store or retailer, and the product category, wherein [0065] teaches the product information can be any amount of type of information associated with the sale of the product, such as number of units sold by a certain company, for example , paragraph [0114] teaches the product information is sales data of weekly sales over a year at all of the retailer’s distribution outlets in a specific geographic retail area of 24-packs of consumer product Z (i.e. subclass); see also: [0074]); 
training multiple models ([0040-0041] teach creating numerous neural network models, wherein [0042-0043] teach training the neural networks using a training data set, as well as [0080] teach the dynamic system can train and re-train the neural network connections; see also: [0031-0036]),
and using each corresponding different validation set to validate each trained model and calculate an error (paragraph [0045] teaches evaluating each neural network for a given validation set of data and producing a validation score, wherein paragraphs [0045-0053] teach the minimum description length is calculated for each validation set using the residual sum of squares; see also: [0046]; Examiner’s Note: Minimum description length is an error minimization technique used in machine learning applications that is calculated using the residual sum of squares, otherwise known as the sum of squared errors, which measures the error of the model compared to the validation data set.); 
calculating model weights for each trained model (paragraph [0045] teaches assigned a weight to each neural network model; see also: [0007]);  
outputting a forecasted demand function comprising a model combination of each trained model and corresponding model weight (paragraph [0053] teaches combining multiple neural network models using their weights in order to combine them into a single model for producing forecasts, wherein paragraphs [0090-0091] teach the forecast is a sales forecast (i.e. forecasting the demand); see also: [0007, 0036, 0045]; Examiner’s Note: Examiner is interpreting the weighted combination of neural network models as being a “function,” or an expression of multiple weighted neural networks used to forecast demand.); 
and generating a forecast of future sales based on the forecasted demand function (paragraphs [0090-0091] teach generating a sales forecast, wherein paragraph [0053] teaches combining multiple neural network models using their weights in order to combine them into a single model for producing forecasts; see also: [0007, 0036, 0045]).
However, Beddo does not explicitly teach randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation sets that correspond to the training sets, wherein each combination of a training set and a validation set forms all of the plurality of data points and for each different sampled pooled training dataset the data points that are not part of the validation set are part of the training set, each different training set and corresponding validation set is formed from the same pooled training dataset; each model trained using a unique training set of the plurality of different training sets, wherein each of the training and validating of each of the multiple models is uses all of the plurality of data points of the pooled training dataset; wherein the error is a root- mean-square error (RMSE) and the calculated model weights are based in the RMSE.
From the same or similar field of endeavor, Ura teaches randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation sets that correspond to the training sets ([0091] teaches randomly sampling sets of training data and test data (i.e. validation set data) from a given population of data, as well as in [0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is designed to narrow down the choices for such combinations of relevant hyper parameter values and sample sizes (i.e. pooled training dataset); see also: [0185]; Examiner’s Note: The pooled training dataset is relevant combination of hyper parameter values and sample sizes of data.), 
wherein each combination of a training set and a validation set forms all of the plurality of data points and for each different sampled pooled training dataset the data points that are not part of the validation set are part of the training set ([0091] teaches the sampled data is sampled without replacement including each set of training data does not include duplicates of the same unit dataset, nor does the test data set, wherein each single sampling never enters the same unit dataset to both the training data and test data, as well as in [0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is designed to narrow down the choices for such combinations of relevant hyper parameter values and sample sizes (i.e. pooled training dataset); see also: [0185]); 
each different training set and corresponding validation set is formed from the same pooled training dataset ([0177-0180] teaches a dataset D that provides the data for the hyper parameter and specific sample size for both the training data and test data, wherein the random sampling of the data from the dataset D is a non-duplicative sampling process that provides test data that exclusive of the training data, as well as in [0005] teaches training the model using a small-size training data, then evaluating the model using test data prepared separately from the training data, wherein the model can be iteratively retrained to learn more of the data until the performance reaches a sufficient level; see also: [0185]); 
wherein each of the training and validating of each of the multiple models is uses all of the plurality of data points of the pooled training dataset ([0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is designed to narrow down the choices for such combinations of relevant hyper parameter values and sample sizes (i.e. pooled training dataset), wherein [0085] teaches the machine learning apparatus can dynamically select a new combination of a hyperparameter value and sample size based on the previous learning results, wherein the selection is not exhaustive, wherein [0157] teaches determining whether all relevant, executed hyper parameter values are been trained);
wherein the error is a root- mean-square error (RMSE) and the calculated model weights are based in the RMSE ([0181-0182] teach learning a model and then calculating a prediction performance using RMSE, wherein [0249] teaches a weighted sum of squared residuals from μ, which is derived from RMSE, which is an evaluation score for the performance, and wherein [0123] teaches a machine learning process that determine weights of explanatory variables within the model, including [0134] teaches the μ is the mean of the predicted performance; see also: [0159, 0197-0199]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beddo to incorporate the teachings of Ura to include randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation sets that correspond to the training sets, wherein each combination of a training set and a validation set forms all of the plurality of data points and for each different sampled pooled training dataset the data points that are not part of the validation set are part of the training set, each different training set and corresponding validation set is formed from the same pooled training dataset; wherein each of the training and validating of each of the multiple models is uses all of the plurality of data points of the pooled training dataset; wherein the error is a root- mean-square error (RMSE) and the calculated model weights are based in the RMSE. One would have been motivated to do so in order to narrow down choices for combinations of training data sets, which allows the learning steps to be dynamic and to not exhaust the computational system through training data (Ura, [0085]). By incorporating Ura into Beddo, one would have been able to seek optimal parameter values more efficiently, which is desirable in the case where the size of training data may vary during the course of progressively sampled machine learning (Ura, [0011-0012]).
However, the combination of Beddo and Ura does not explicitly teach each model trained using a unique training set of the plurality of different training sets.
From the same or similar field of endeavor, He teaches each model trained using a unique training set of the plurality of different training sets ([0054-0055] teach the model manager can generate and train models using price information and more, wherein the models can be trained with training data using different data for each model; see also: [0017], which discloses demand forecasting to forecast a price of a commodity). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo and Ura to incorporate the teachings of He to include each model trained using a unique training set of the plurality of different training sets. One would have been motivated to do so in order to integrate smaller consumers into demand response strategies by using real-time forecasting and demand response aggregators (He, [0002]). By incorporating He into Beddo, one would have been able to develop each model with different features, leading to similar but different enough to models that generate a probabilistic array of outputs with the same inputs (He, [0055]).
Regarding claims 9 and 17, the claims recite limitations already addressed by the rejection of claim 1. Regarding claim 9, Beddo teaches a computer-readable medium having instructions stored thereon that (paragraph [0012] teaches a non-transitory computer readable medium provided for forecasting sales), when executed by a processor, cause the processor to forecast sales of a retail item (paragraph [0162] teaches the non-transitory computer readable medium has computer executable code). Regarding claim 17, Beddo teaches a retail sales forecasting system comprising (Figs. 2-5): a processor coupled to a storage device that implements promotions effect module comprising (paragraph [0094] teaches a forecasting apparatus including a processor coupled to a memory with a software application (i.e. .module)). Therefore, the rejection to claim 1 as being unpatentable over Beddo in view of Ura in view of He applies to claims 9 and 17. 

Regarding claims 2, 10, and 18, the combination of Beddo, Ura, and He teach all the limitations of claims 1, 9, and 17 above.
	Beddo further teaches the training multiple models comprises using a machine learning algorithm for the training (paragraph [0040] teaches creating multiple neural network models (i.e. machine learning), wherein paragraph [0034] the model actively trains and re-trains).

	Regarding claims 3, 11, and 19, the combination of Beddo, Ura, and He teach all the limitations of claims 2, 10, and 18 above.
	Beddo further teaches the machine learning algorithm comprises one of Artificial Neural Networks (paragraph [0024] teaches generating a neural network).

	Regarding claims 4, 12, and 20 the combination of Beddo, Ura, and He teach all the limitations of claims 1, 9, and 17 above.
Beddo further teaches the historical data comprises data for multiple retail stores and multiple stock keeping units that belong to a subclass over multiple time periods (paragraph [0063] teaches the product information includes the sale of a product by an entity with multiple locations (i.e. multiple retail stores), wherein paragraph [0064] the product information relates to the sales of products (i.e. multiple stock keepings units), wherein paragraph [0065] the product information contains sales data for the number of units sold during any time period, such as days, weeks, or months (i.e. multiple time periods), wherein paragraph [0070] teaches the product information relates the sales of two different products from the same competitive selection set);
wherein the aggregating comprises a subclass level (paragraph [0070] teaches the product information may multiple products existing within the same competitive selection set that are associated with different brands (i.e. subclass level)).

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al. (US 20170372229 A1) in view of He et al. (US 20170236136 A1) and further in view of McMahon et al. (US 20160048766 A1).

Regarding claims 5 and 13, the combination of Beddo, Ura, and He teach all the limitations of claims 1 and 9 above.
	However, Beddo does not explicitly teach the randomly sampling comprises sampling with replacement.
	From the same or similar field of endeavor, McMahon teaches the randomly sampling comprises sampling with replacement (paragraph [0054] teaches creating sub-datasets using random sampling with replacement; see also: [0009, 0046]).
It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify the combination of Beddo, Ura, and He to incorporate the teachings of McMahon to include the randomly sampling comprises sampling with replacement. One would be motivated to do so in order to average the validation results over the multiple rounds of testing (McMahon, [0047]).

Claims 6-8 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al. (US 20170372229 A1) in view of He et al. (US 20170236136 A1) and further in view of Caldeira et al. (Predicting the yield curve using forecast combinations; Caldeira J.F., Moura G.V., Santos A.A.P. (2016); Computational Statistics and Data Analysis, 100, pp. 79-98.) and further in view of Kraftsow et al. (US 20130346385 A1).

Regarding claims 6 and 14, the combination of Beddo, Ura, and He teach all the limitations of claims 1 and 9 above.
However, Beddo does not explicitly teach for each model of each training set i, the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    .
	From the same or similar field of endeavor, Caldeira teaches for each model of each training set i (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” teaches computing the root mean square error of all selected models),
the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                     (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses an equation, wherein the number teaches calculating the model weight for one model as: 
    PNG
    media_image1.png
    27
    145
    media_image1.png
    Greyscale
.).
While Caldeira does not explicitly evaluate item demand forecasting, Caldeira presents a solution to a problem reasonably pertinent to the claimed invention. For example, as explained above, Beddo addresses calculating weights for each forecasting model; however, Beddo does not explicitly teach the claimed manner of calculating weights for each forecasting model. Caldeira describes an approach to improving the usefulness of weighted forecasting. In Beddo, one is inquiring about an optimal sales forecast. Analogously, in Caldeira, one is inquiring about an optimal interest rate forecasting. It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify the combination of Beddo, Ura, and He to incorporate the teachings of Caldeira to include for each model of each training set, i the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    . This improvement is suggested since Caldeira analogously provides this weighting scheme in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).
Although the combination of Beddo, Ura, He, and Caldeira teach the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    , the combination does not explicitly teach the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                            .
                             
                        
                     (Emphasis added by the Examiner, specifically the “1+” in the denominator of the equation.)
From the same or similar field of endeavor, Kraftsow discloses adding a small constant to the denominator of a weighting equation (paragraphs [0039-0040] teach summing the reciprocal of an equation, wherein a constant factor of “cis” (such as 1) is added to the denominator). 
It would have been obvious to one of ordinary skill in the art to modify the equation Caldeira to incorporate the teachings of Kraftsow to incorporate the addition of a small constant to the denominator of the weighting equation. This known technique is being applied to a known art ready for improvement. The improvement is provided by Kraftsow because the technique of Kraftsow limits the maximum score of the weights (Kraftsow, [0039]). Additionally, the art of Kirshenbaum (US 20110119209 A1) suggests a similar motivation, wherein a small constant is added to the denominator of a weighting formula to ensure the uncertainty value can never be zero (Kirshenbaum, [0063-0064]). A person having ordinary skill in the art would recognize the benefit of adding this constant factor to the denominator, which would limit any single model from being weighed as infinity as the error value approaches zero.

Regarding claims 7 and 15, the combination of Beddo, Ura, He, Caldeira, and Kraftsow teach all the limitations of claims 6 and 14 above.
However, Beddo does not explicitly teach determining a sum S of the model weights w(i) comprising S=sum(w(i)); and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                    .
	From the same or similar field of endeavor, Caldeira further teaches: 
determining a sum S of the model weights w(i) comprising S=sum(w(i)) (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses calculating the sum of each model weight in the denominator of the equation: 
    PNG
    media_image2.png
    45
    156
    media_image2.png
    Greyscale
); 
and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                     (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses calculating the normalized weight for each model using the following equation:
    PNG
    media_image3.png
    87
    301
    media_image3.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo, Ura, He, Caldeira, and Kraftsow to incorporate the further teachings of Caldeira to include determining a sum S of the model weights w(i) comprising S=sum(w(i)); and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                    . One would be motivated to do so in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).

	Regarding claims 8 and 16, the combination of Beddo, Ura, He, Caldeira, and Kraftsow teach all the limitations of claims 7 and 15 above.
	However, Beddo does not explicitly teach the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)), wherein f comprises the function to forecast for each model, and x corresponds to each data point.
	From the same or similar field of endeavor, Caldeira further teaches: the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)) (page 85 section 2.8 “Combined forecasts” teaches: 
    PNG
    media_image4.png
    73
    317
    media_image4.png
    Greyscale
; Examiner’s Note: The “y” variable is the equivalent to the claimed y variable; The                         
                            
                                ∑
                                
                                     
                                
                            
                        
                    variable is the equivalent of the “sum” variable. The “w” variable is the equivalent of the claimed w variable. The second “y” is equivalent to f(M(i),x), wherein the “tau” is equivalent to the claimed x variable),
 wherein f comprises the function to forecast for each model (page 85 section 2.8 “Combined forecasts” teaches the “y” is the forecast of the mth model. Examiner’s Note: The second “y” is equivalent to f(M(i),x), wherein the “tau” is equivalent to the claimed x variable).
and x corresponds to each data point (page 85 section 2.8 “Combined forecasts” teaches the “tau” is each maturity value (i.e. a data point)). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo, Ura, He, Caldeira, and Kraftsow to incorporate the further teachings of Caldeira to include the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)), wherein f comprises the function to forecast for each model, and x corresponds to each data point. One would be motivated to do so in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Ray et al. (US 20160328724 A1) discloses receiving and clustering a set of SKUs in order to create a forecast for each cluster of SKUs, wherein the forecasting includes weighting each dynamic linear model used to produce a sales forecast for inventory ordering

THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sara G Brown whose telephone number is (469)295-9145. The examiner can normally be reached M-Th 8:00 am- 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Epstein can be reached on (571) 270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/S.G.B./Examiner, Art Unit 3683                                                                                                                                                                                                        
/BRIAN M EPSTEIN/Supervisory Patent Examiner, Art Unit 3683