Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/05/2021 has been entered.
 
Response to Arguments
Regarding the 35 USC 103 rejection, Examiner has fully considered Applicant’s arguments and amendments filed 04/01/2021. Regarding Applicant’s assertions in view of McMahon and Brzezicki, Examiner has provided an updated rejection, which was necessitated by amendment. Accordingly, Applicant’s remarks are moot. The claims are now rejected over the combination of Beddo, Ura, and He. See the detailed rejection below. 
Therefore, the present claims are rejected under 35 USC 103.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:


The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-4, 9-12, and 17-20 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al (US 20170372229 A1) in view of He et al. (US 20170236136 A1).

Regarding claim 1, Beddo teaches a method of forecasting sales of a retail item (Figs. 2 and 4), the method comprising:
receiving historical sales data of a class of a retail item that comprises a plurality of stock keeping units (paragraph [0064] teaches collecting historical product information associated with the sale of products of a specific category (i.e. class); see also: [0002, 0024, 0063, 0069]), 
the historical sales data comprising past sales and promotions of the retail item across a plurality of past time periods (paragraphs [0065] teaches collecting product information including sales data measured in any time period, such as weeks, days, months, etc., wherein paragraph [0067] the product information includes promotions associated with the product; see also: [0024, 0063, 0069]), 
the historical sales data corresponding to an amount of sales of each stock keeping unit at each store during each of the past time periods (paragraph [0064] teaches the product information relates to the sale of a products of a category; paragraph [0063] teaches the sales product information includes the sales of the product by an individual store, retailer as a whole, or geographic area; paragraph [0065] teaches the product sales data is the number of units sold during a time period, such as hours, weeks, days, months, and years); 
aggregating the historical sales to a higher aggregation level than the historical sales data to form a pooled training dataset having a plurality of data points (paragraph [0074] teaches constructing a data matrix of product information used by the neural network, wherein paragraph [0114] the product information is sales data of weekly sales over a year at all of the retailer’s distribution outlets in a specific geographic retail area (i.e. a higher aggregation level)), 
the higher level comprising an amount of sales of each subclass that corresponds to the stock keeping units at each store during each of the past time periods (paragraph [0074] teachings constructing a data matrix of product information used by the neural network, wherein paragraph [0114] the product information is sales data of weekly sales over a year at all of the retailer’s distribution outlets in a specific geographic retail area of 24-packs of consumer product Z (i.e. subclass)); 
each data point representing a subclass/store combination ([0063-0064] teach the production information includes the product information is associated with the sale of a product by a retailer, such as a grocery store or retailer, and the product category, wherein [0065] teaches the product information can be any amount of type of information associated with the sale of the product, such as number of units sold by a certain company, for example , paragraph [0114] teaches the product information is sales data of i.e. subclass); see also: [0074]); 
training multiple models ([0040-0041] teach creating numerous neural network models, wherein [0042-0043] teach training the neural networks using a training data set, as well as [0080] teach the dynamic system can train and re-train the neural network connections; see also: [0031-0036]),
and using each corresponding different validation set to validate each trained model and calculate an error (paragraph [0045] teaches evaluating each neural network for a given validation set of data and producing a validation score, wherein paragraphs [0045-0053] teach the minimum description length is calculated for each validation set using the residual sum of squares; see also: [0046]; Examiner’s Note: Minimum description length is an error minimization technique used in machine learning applications that is calculated using the residual sum of squares, otherwise known as the sum of squared errors, which measures the error of the model compared to the validation data set.); 
calculating model weights for each trained model (paragraph [0045] teaches assigned a weight to each neural network model; see also: [0007]);  
outputting a forecasted demand function comprising a model combination of each trained model and corresponding model weight (paragraph [0053] teaches combining multiple neural network models using their weights in order to combine them into a single model for producing forecasts, wherein paragraphs [0090-0091] teach the forecast is a sales forecast (i.e. forecasting the demand); see also: [0007, 0036, 0045]; Examiner’s Note: Examiner is interpreting the weighted combination of neural network models as being a “function,” or an expression of multiple weighted neural networks used to forecast demand.); 
and generating a forecast of future sales based on the forecasted demand function (paragraphs [0090-0091] teach generating a sales forecast, wherein paragraph [0053] teaches combining multiple neural network models using their weights in order to combine them into a single model for producing forecasts; see also: [0007, 0036, 0045]).
randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation sets that correspond to the training sets, wherein each combination of a training set and a validation set forms all of the plurality of data points and for each different sampled pooled training dataset the data points that are not part of the validation set are part of the training set, each different training set and corresponding validation set is formed from the same pooled training dataset; each model trained using a unique training set of the plurality of different training sets, wherein each of the training and validating of each of the multiple models is uses all of the plurality of data points of the pooled training dataset; wherein the error is a root- mean-square error (RMSE) and the calculated model weights are based in the RMSE.
From the same or similar field of endeavor, Ura teaches randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation sets that correspond to the training sets ([0091] teaches randomly sampling sets of training data and test data (i.e. validation set data) from a given population of data, as well as in [0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is designed to narrow down the choices for such combinations of relevant hyper parameter values and sample sizes (i.e. pooled training dataset); see also: [0185]; Examiner’s Note: The pooled training dataset is relevant combination of hyper parameter values and sample sizes of data.), 
wherein each combination of a training set and a validation set forms all of the plurality of data points and for each different sampled pooled training dataset the data points that are not part of the validation set are part of the training set ([0091] teaches the sampled data is sampled without replacement including each set of training data does not include duplicates of the same unit dataset, nor does the test data set, wherein each single sampling never enters the same unit dataset to both the training data and test data, as well as in [0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is i.e. pooled training dataset); see also: [0185]); 
each different training set and corresponding validation set is formed from the same pooled training dataset ([0177-0180] teaches a dataset D that provides the data for the hyper parameter and specific sample size for both the training data and test data, wherein the random sampling of the data from the dataset D is a non-duplicative sampling process that provides test data that exclusive of the training data, as well as in [0005] teaches training the model using a small-size training data, then evaluating the model using test data prepared separately from the training data, wherein the model can be iteratively retrained to learn more of the data until the performance reaches a sufficient level; see also: [0185]); 
wherein each of the training and validating of each of the multiple models is uses all of the plurality of data points of the pooled training dataset ([0179-0180] teach randomly extracting training data and test data with a sample size from the available dataset D, wherein [0085] teaches the machine learning apparatus is designed to narrow down the choices for such combinations of relevant hyper parameter values and sample sizes (i.e. pooled training dataset), wherein [0085] teaches the machine learning apparatus can dynamically select a new combination of a hyperparameter value and sample size based on the previous learning results, wherein the selection is not exhaustive, wherein [0157] teaches determining whether all relevant, executed hyper parameter values are been trained);
wherein the error is a root- mean-square error (RMSE) and the calculated model weights are based in the RMSE ([0181-0182] teach learning a model and then calculating a prediction performance using RMSE, wherein [0249] teaches a weighted sum of squared residuals from μ, which is derived from RMSE, which is an evaluation score for the performance, and wherein [0123] teaches a machine learning process that determine weights of explanatory variables within the model, including [0134] teaches the μ is the mean of the predicted performance; see also: [0159, 0197-0199]). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Beddo to incorporate the teachings of Ura to include randomly sampling the pooled training dataset to form a plurality of different training sets and a plurality of different validation 
However, the combination of Beddo and Ura does not explicitly teach each model trained using a unique training set of the plurality of different training sets.
From the same or similar field of endeavor, He teaches each model trained using a unique training set of the plurality of different training sets ([0054-0055] teach the model manager can generate and train models using price information and more, wherein the models can be trained with training data using different data for each model; see also: [0017], which discloses demand forecasting to forecast a price of a commodity). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo and Ura to incorporate the teachings of He to include each model trained using a unique training set of the plurality of different training sets. One would have been motivated to do so in order to integrate smaller consumers into demand response strategies by using real-time forecasting and demand response aggregators (He, [0002]). By incorporating He into Beddo, one would have been able to develop each model with different features, leading to 
Regarding claims 9 and 17, the claims recite limitations already addressed by the rejection of claim 1. Regarding claim 9, Beddo teaches a computer-readable medium having instructions stored thereon that (paragraph [0012] teaches a non-transitory computer readable medium provided for forecasting sales), when executed by a processor, cause the processor to forecast sales of a retail item (paragraph [0162] teaches the non-transitory computer readable medium has computer executable code). Regarding claim 17, Beddo teaches a retail sales forecasting system comprising (Figs. 2-5): a processor coupled to a storage device that implements promotions effect module comprising (paragraph [0094] teaches a forecasting apparatus including a processor coupled to a memory with a software application (i.e. .module)). Therefore, the rejection to claim 1 as being unpatentable over Beddo in view of Ura in view of He applies to claims 9 and 17. 

Regarding claims 2, 10, and 18, the combination of Beddo, Ura, and He teach all the limitations of claims 1, 9, and 17 above.
	Beddo further teaches the training multiple models comprises using a machine learning algorithm for the training (paragraph [0040] teaches creating multiple neural network models (i.e. machine learning), wherein paragraph [0034] the model actively trains and re-trains).

	Regarding claims 3, 11, and 19, the combination of Beddo, Ura, and He teach all the limitations of claims 2, 10, and 18 above.
	Beddo further teaches the machine learning algorithm comprises one of Artificial Neural Networks (paragraph [0024] teaches generating a neural network).

	Regarding claims 4, 12, and 20 the combination of Beddo, Ura, and He teach all the limitations of claims 1, 9, and 17 above.
the historical data comprises data for multiple retail stores and multiple stock keeping units that belong to a subclass over multiple time periods (paragraph [0063] teaches the product information includes the sale of a product by an entity with multiple locations (i.e. multiple retail stores), wherein paragraph [0064] the product information relates to the sales of products (i.e. multiple stock keepings units), wherein paragraph [0065] the product information contains sales data for the number of units sold during any time period, such as days, weeks, or months (i.e. multiple time periods), wherein paragraph [0070] teaches the product information relates the sales of two different products from the same competitive selection set);
wherein the aggregating comprises a subclass level (paragraph [0070] teaches the product information may multiple products existing within the same competitive selection set that are associated with different brands (i.e. subclass level)).

Claims 5 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al. (US 20170372229 A1) in view of He et al. (US 20170236136 A1) and further in view of McMahon et al. (US 20160048766 A1).

Regarding claims 5 and 13, the combination of Beddo, Ura, and He teach all the limitations of claims 1 and 9 above.
	However, Beddo does not explicitly teach the randomly sampling comprises sampling with replacement.
	From the same or similar field of endeavor, McMahon teaches the randomly sampling comprises sampling with replacement (paragraph [0054] teaches creating sub-datasets using random sampling with replacement; see also: [0009, 0046]).
It would have been obvious to one of ordinary skill in the art at the time of Applicant’s invention to modify the combination of Beddo, Ura, and He to incorporate the teachings of McMahon to include the .

Claims 6-8 and 14-16 are rejected under 35 U.S.C. 103 as being unpatentable over Beddo et al. (US 20140108094 A1) in view of Ura et al. (US 20170372229 A1) in view of He et al. (US 20170236136 A1) and further in view of Caldeira et al. (Predicting the yield curve using forecast combinations; Caldeira J.F., Moura G.V., Santos A.A.P. (2016); Computational Statistics and Data Analysis, 100, pp. 79-98.) and further in view of Kraftsow et al. (US 20130346385 A1).

Regarding claims 6 and 14, the combination of Beddo, Ura, and He teach all the limitations of claims 1 and 9 above.
However, Beddo does not explicitly teach for each model of each training set i, the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    .
	From the same or similar field of endeavor, Caldeira teaches for each model of each training set i (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” teaches computing the root mean square error of all selected models),
the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                     (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses an equation, wherein the number teaches calculating the model weight for one model as: 
    PNG
    media_image1.png
    27
    145
    media_image1.png
    Greyscale
.).
While Caldeira does not explicitly evaluate item demand forecasting, Caldeira presents a solution to a problem reasonably pertinent to the claimed invention. For example, as explained above, Beddo addresses calculating weights for each forecasting model; however, Beddo does not explicitly teach the claimed manner of calculating weights for each forecasting model. Caldeira describes an approach to improving the usefulness of weighted forecasting. In Beddo, one is inquiring about an optimal sales forecast. Analogously, in Caldeira, one is inquiring about an optimal interest rate forecasting. It would  the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    . This improvement is suggested since Caldeira analogously provides this weighting scheme in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).
Although the combination of Beddo, Ura, He, and Caldeira teach the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                        
                    , the combination does not explicitly teach the calculating model weights w(i) comprises:                         
                            w
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    1
                                
                                
                                    1
                                    +
                                    R
                                    M
                                    S
                                    E
                                    (
                                    i
                                    )
                                
                            
                            .
                             
                        
                     (Emphasis added by the Examiner, specifically the “1+” in the denominator of the equation.)
From the same or similar field of endeavor, Kraftsow discloses adding a small constant to the denominator of a weighting equation (paragraphs [0039-0040] teach summing the reciprocal of an equation, wherein a constant factor of “cis” (such as 1) is added to the denominator). 
It would have been obvious to one of ordinary skill in the art to modify the equation Caldeira to incorporate the teachings of Kraftsow to incorporate the addition of a small constant to the denominator of the weighting equation. This known technique is being applied to a known art ready for improvement. The improvement is provided by Kraftsow because the technique of Kraftsow limits the maximum score of the weights (Kraftsow, [0039]). Additionally, the art of Kirshenbaum (US 20110119209 A1) suggests a similar motivation, wherein a small constant is added to the denominator of a weighting formula to ensure the uncertainty value can never be zero (Kirshenbaum, [0063-0064]). A person having ordinary skill in the art would recognize the benefit of adding this constant factor to the denominator, which would limit any single model from being weighed as infinity as the error value approaches zero.

Regarding claims 7 and 15, the combination of Beddo, Ura, He, Caldeira, and Kraftsow teach all the limitations of claims 6 and 14 above.
determining a sum S of the model weights w(i) comprising S=sum(w(i)); and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                    .
	From the same or similar field of endeavor, Caldeira further teaches: 
determining a sum S of the model weights w(i) comprising S=sum(w(i)) (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses calculating the sum of each model weight in the denominator of the equation: 
    PNG
    media_image2.png
    45
    156
    media_image2.png
    Greyscale
); 
and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                     (page 86, section 5. “Thick modeling approach with RMSE-weights (FC-RMSE)” discloses calculating the normalized weight for each model using the following equation:
    PNG
    media_image3.png
    87
    301
    media_image3.png
    Greyscale
).
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo, Ura, He, Caldeira, and Kraftsow to incorporate the further teachings of Caldeira to include determining a sum S of the model weights w(i) comprising S=sum(w(i)); and normalizing a weight w'(i) for each w(i) comprising                         
                            
                                
                                    w
                                
                                
                                    '
                                
                            
                            
                                
                                    i
                                
                            
                            =
                            
                                
                                    w
                                    (
                                    i
                                    )
                                
                                
                                    s
                                
                            
                        
                    . One would be motivated to do so in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).

	Regarding claims 8 and 16, the combination of Beddo, Ura, He, Caldeira, and Kraftsow teach all the limitations of claims 7 and 15 above.
	However, Beddo does not explicitly teach the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)), wherein f comprises the function to forecast for each model, and x corresponds to each data point.
	From the same or similar field of endeavor, Caldeira further teaches: the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)) (page 85 section 2.8 
    PNG
    media_image4.png
    73
    317
    media_image4.png
    Greyscale
; Examiner’s Note: The “y” variable is the equivalent to the claimed y variable; The                         
                            
                                ∑
                                
                                     
                                
                            
                        
                    variable is the equivalent of the “sum” variable. The “w” variable is the equivalent of the claimed w variable. The second “y” is equivalent to f(M(i),x), wherein the “tau” is equivalent to the claimed x variable),
 wherein f comprises the function to forecast for each model (page 85 section 2.8 “Combined forecasts” teaches the “y” is the forecast of the mth model. Examiner’s Note: The second “y” is equivalent to f(M(i),x), wherein the “tau” is equivalent to the claimed x variable).
and x corresponds to each data point (page 85 section 2.8 “Combined forecasts” teaches the “tau” is each maturity value (i.e. a data point)). 
It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combination of Beddo, Ura, He, Caldeira, and Kraftsow to incorporate the further teachings of Caldeira to include the generating the forecast of future sales y using each model M(i) comprises: y = sum(f(M(i), x)*W(i)), wherein f comprises the function to forecast for each model, and x corresponds to each data point. One would be motivated to do so in order to alleviate model uncertainty (Caldeira, Page 80, first paragraph).

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Goplan (US 11151472 B2) discloses feeding a portion of all of the training data set to the machine learning algorithm to generate a machine learning model, wherein the training data sets can be randomly sampled
Epstein et al. (US 20180109829 A1) discloses producing models with training and validation sets, wherein the models are fit by drawing random samples from the data for each decision tree 
Khavronin (US 20170364931 A1) discloses the training and testing process uses different parameter sets
Achin (US 20180060744 A1) discloses partitioning the dataset into a number of partitions including a number of training datasets and validation datasets

Any inquiry concerning this communication or earlier communications from the examiner should be directed to Sara G Brown whose telephone number is (469)295-9145. The examiner can normally be reached M-Th 8:00 am- 5:00 pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Brian Epstein can be reached on (571) 270-5389. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/S.G.B./Examiner, Art Unit 3683                                                                                                                                                                                                        

/BRIAN M EPSTEIN/Supervisory Patent Examiner, Art Unit 3683