Detailed Action
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Notice to Applicant
The following is a Final Office action to Application Serial Number 16/818,586, filed on March 13, 2020.  In response to Examiner’s Office Action of February 25, 2022, Applicant, on May 27, 2022, amended claims 1, 3-8, 10-14, and 16-19.   Claims 1-19 are pending in this application and have been rejected below.
Response to Amendment
Applicant’s amendments are acknowledged.
Regarding the 35. U.S.C. § 101 rejection, Applicant’s arguments have been
considered but are insufficient to overcome the rejection. Please refer to the 35 U.S.C.§ 101 rejection for further explanation and rationale. 
The 35 U.S.C. § 103 rejections are hereby amended pursuant to applicants amendments and updated 35 U.S.C. § 103 rejections have been applied to amended claims. Please refer to the § 103 rejection for further explanation and rationale.
Response to Arguments
Applicant’s arguments filed May 27, 2022 have been fully considered but they are not persuasive and/or are moot in view of the revised rejections.  Applicant’s arguments will be addressed herein below in the order in which they appear in the response filed May 27, 2022.
On page 8-9 of the Remarks regarding 35 U.S.C. § 101, Applicant states the claims are directed to an improvement to the narrow technical field of forecasting in a specific real retail area where a product has been or will be for sale in a retail environment, and this improvement in forecasting impacts inventory levels and product ordering which integrate the claims into a practical application.  In response,  Examiner respectfully disagrees. The present claims amount to no more than utilizing computer elements as tools to perform clustering analysis. Examiner finds the present claims improve an existing process of retail forecasting and there are currently no functional advancement to any technology or technological field, in order for the claim elements to be considered significantly more than the abstract idea itself. Utilizing computer structure and technology (see par. 0023-0025) to calculate inventory are all, both individually and in combination, computer functions such as receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information); and storing and retrieving information in memory, Versata Dev. Group, Inc. v. SAP Am., Inc., 793 F.3d 1306, 1334, 115 USPQ2d 1681, 1701 (Fed. Cir. 2015) (See MPEP 2106.05(d)(II)).
On page 9-10 of the Remarks regarding 35 U.S.C. § 101, Applicant states under Step 2B, the new clusters (i.e., the optimal clusters) improve the accuracy of forecasting, while concurrently using time warping and classifications to flexibly generate forecasts across items, item classes, item types, and the like. These computing advantages lead to attendant downstream advantages as well, with this flexibility of configuration leading to better forecasts across a wider variety of products. In response, Examiner respectfully disagrees. The aforementioned techniques are not improvements to a problem in the software arts, a technology or technological field. The partitional clustering analysis is a judicial exception (i.e. abstract idea). The claimed invention is executed by computer elements performing generic computer functions (see Applicant’s Specification par. 0023-0025). Examiner asserts, regardless of the complexity of the data analysis and/or processing, without recitation of improvements to the functioning of the technology, technological field and/or computer-related technology (i.e. software), the steps outlined in the claimed invention to generate retail forecasts amount to no more than mere instructions to implement the idea on a general purpose computer. Applicant has not identified anything in the claimed invention that shows or even submits the technology is being improved or there was a problem in the technology that the claimed invention solves.
On page 11-13 of the Remarks regarding 35 U.S.C. § 103, Applicant states Ray, Bai, and Panda fail to teach or suggest the combination of features as recited in the amended independent claims. In response, Applicant’s remarks under 35 U.S.C. § 103 have been fully considered.  However, upon further consideration, Applicant has made amendments, and the new amendments necessitate a revised rejection.  Please refer to 35 U.S.C. § 103 rejection for further explanation and rationale in light of the amendments. 
On page 14-16 of the Remarks regarding 35 U.S.C. § 103, Applicant states cited prior art references are not obvious. In response to applicant’s argument that there is no teaching, suggestion, or motivation to combine the references, the examiner recognizes that obviousness may be established by combining or modifying the teachings of the prior art to produce the claimed invention where there is some teaching, suggestion, or motivation to do so found either in the references themselves or in the knowledge generally available to one of ordinary skill in the art.  See In re Fine, 837 F.2d 1071, 5 USPQ2d 1596 (Fed. Cir. 1988), In re Jones, 958 F.2d 347, 21 USPQ2d 1941 (Fed. Cir. 1992), and KSR International Co. v. Teleflex, Inc., 550 U.S. 398, 82 USPQ2d 1385 (2007).  In this case, the cited prior art is directed to forecasting/modeling techniques and analysis.  The use of time-series data analysis and clustering methodologies improve upon the forecasting and market modeling analysis.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1- 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1-19 are directed to a method for forecasting product demand for a product having limited historical sales data.

Claim 1 recites a method for forecasting product demand for a product having limited historical sales data , which includes receiving time series sales data for a first product; receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data; for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; for each dynamically time warped dataset, performing a clustering analysis to obtain a clustering model with an optimal number of clusters, wherein the clustering analysis includes: applying a plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets; and identifying the clustering model with the optimal number of clusters from among the plurality of clustering models based, at least in part, on a cluster validity index generated for each clustering model; for each cluster within the clustering model with the optimal number of clusters define a prototype time series, the prototype time series for each cluster being representative of the cluster; from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lines; and utilizing the prototype time series of the cluster within which the time series sales data for the first product lines in generating a forecast of demand of the first product.  Claim 8 recites a method for forecasting product demand for a product having limited historical sales data which includes receiving time series sales data for a first product; receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data; for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; for each dynamically time warped dataset, performing a partitional clustering analysis to obtain a clustering model with an optimal number of clusters, wherein the partitional clustering analysis includes: applying a plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets, wherein each clustering model of the plurality of clustering models represents a cluster of products from among the first product and the plurality of different second products, and each clustering model generates a different number of clusters; and identifying the clustering model with the optimal number of clusters from among the plurality of clustering models based, at least in part, on a cluster validity index generated for each clustering model; for each cluster within the clustering model with the optimal number of clusters define a prototype time series, the prototype time series for each cluster corresponding to a medoid of the cluster ; from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lines; and utilizing the prototype time series of the cluster within which the time series sales data for the first product lines in generating a forecast of demand of the first product. Claim 14 recites a method for forecasting product demand for a product having limited historical sales data which includes receiving time series sales data for a first product; receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data; for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; for each dynamically time warped dataset, performing a partitional clustering analysis to produce a plurality of clustering models, wherein the partitional clustering analysis includes: applying the plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets, wherein each of the plurality of clustering models generate a different number of clusters; and for each of the plurality of clustering models, applying a cluster validity analysis to obtain one clustering model of the plurality of clustering models ; for each clustering model, applying a cluster validity analysis to obtain one clustering model with an optimal number of clusters; for each cluster within the one clustering model, define a prototype time series, the prototype time series for each cluster corresponding to a medoid of the cluster; determining within which cluster of the one clustering model the time series sales data for the first product lines; and utilizing the prototype time series of the cluster within which the time series sales data for the first product lines in generating a forecast of demand of the first product.

As drafted, this is, under its broadest reasonable interpretation, within the Abstract idea grouping of “Methods of Organizing Human Activity- sales activities and Mathematical Concepts- mathematical calculations.  Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. There are no the additional elements to integrate the abstract idea into a practical application. The claims also fail to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, and/or an additional element applies or uses the judicial  exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.  See 84 Fed. Reg. 55.  In particular, there is a lack of improvement to a computer or technical field in forecasting. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
Dependent Claims 2-7, 9-13 and 15-19 recite z-scoring the time series data ; each clustering model generates a different number of clusters; the first product is from a pre-defined multi-level product hierarchy; the plurality of different second products are selected from the predefined multi-level product hierarchy; the clustering analysis is based on: first hierarchical data attributes associated with the first product, second hierarchical data attributes associated with each of the plurality of second products, first time series data attributes associated with the time series sales data for the first product, and second time series data attributes associated with the time series sales data for the plurality of second products, wherein the pre-defined multi-level product hierarchy comprises a plurality of levels, each level of the plurality of levels corresponding to a hierarchical data attribute, wherein the first time series data attributes are specific to the times of sale of the first product, and the second time series data attributes are specific to the times of sale of each of the plurality of second products; utilizing the forecast of demand of the first product to calculate inventory needs of the first product within a supply chain of an enterprise; the first product and the plurality of different second products are selected from different levels of the pre-defined multi-level product hierarchy; and further narrowing the abstract idea. These recited limitations in the dependent claims do not amount to significantly more than the above-identified judicial exceptions in Claims 1, 8 and 14. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1,3, 7-8, and 13-14 and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], in further view of Rani, "Modified hierarchical clustering algorithm for time series data," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 4036-4040, [hereinafter Rani] and in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda].
Regarding Claim 1,  
Ray teaches
A method for forecasting product demand for a product having limited historical sales data, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data; (Ray- Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”; Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lies (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lines in generating a forecast of demand of the first product (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).
Ray teaches times series data analysis for sales forecasting and the feature is expounded upon by Bai:
for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 
Ray and Bai are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray, as taught by Bai, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray with the motivation of saving forecast time and cost (Bai Section I).
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Rani:
for each dynamically time warped dataset, performing a clustering analysis to obtain a clustering model with an optimal number of clusters, wherein the clustering analysis includes: applying a plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets  (Rani Section I-“ Categories [1] of Time-Series Clustering Algorithms include: Temporal-Proximity-Based Clustering Procedures that work with unanalyzed facts and figures, either in frequency domain or time domain. It is also known as raw-data-based approach. It uses distance measures which consider temporal relations, e.g. Dynamic Time Warping, Temporal K-means/Hierarchical Clustering. When a set of features are extracted from raw data and any of the existing clustering algorithm e.g. [3] “Discrete Wavelet Transform”, “Adaptive Piecewise Constant Approximation”, and “Curvature-Based PCA Segments” etc. is applied on the feature space, clustering is known as representation or feature based clustering. Model-Based Clustering uses clusters of temporal data which are specified by a mixture of dynamic models ego Hidden-Markov-Model (HMM), Auto Regressive Moving Average Model (ARMA) identifies the data dependency and regularity behind the dynamic behaviors of time-related data.”; Section II- “ Step 1Start by assigning each item to its own cluster, so that if there are N items, there are N clusters, each containing just one item.
Step2 Use Dynamic Time Warping as time-series similarity measure, and then let the similarities between the clusters equal the similarities between the items they contain.
Step3 Find most similar pair of clusters and merge them into a single cluster, so that now one cluster can be reduced.
Step4 Compute the average linkage as similarities between the new cluster and each of the old clusters. Step5 Repeat steps 3 and 4 until get K clusters.
Step6 Adopt inter/intra-cluster-distance-based swap to refine the K clusters from step 5 and then get the new K clusters.”); 
and identifying the clustering model with the optimal number of clusters from among the plurality of clustering models based, at least in part, on a cluster validity index generated for each clustering model; (Rani Section 1.C-E- “ “CVAP (Cluster Validity and Analysis Platform)” [5] is a MATLAB based visual cluster validation tool. It gives essential tools and suitable environment for evaluating the validity or reasonability of clustering algorithms. It has four “External validity indices”, fourteen “internal validity indices” and five clustering algorithms like “K-means”, PAM, “hierarchical clustering” etc. It also acknowledges other clustering algorithms by loading a solution file having class labels, or by addition of new codes, and Pearson correlation coefficient, and similarity metrics of Euclidean distance are supported. D. Inter/Intra Cluster-Distance-Based-Swap -This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output. E. Evaluation Criteria “Cluster validation” is essential and mandatory procedure in analyzing clusters. There are various validation indices in CVAP to give measures for validity for every cluster. The validation indices [5] also give an explicit scenario on the optimal number of clusters. “)
Ray, Bai, and Rani are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Rani, by utilizing additional clustering validation techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of improved hierarchical clustering algorithm techniques (Rani Section II).

Ray in view of Ba in further view of Rani teach time series data analysis and the following feature is expounded upon by Panda:
 for each cluster within the clustering model with the optimal number of clusters define a prototype time series, the prototype time series for each cluster being representative of the cluster; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai, Rani and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).
Regarding Claim 3,
Rai in view of Bai in further view of Rani in further view of Panda teach The method of claim 1,…
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Rani:
wherein each clustering model generates a different number of clusters (Rani Section I-D-“ D. Inter/Intra Cluster-Distance-Based-Swap- This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output.”; Section II-C; Table 1; Table 2 and related text.)
Ray, Bai, and Rani are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Rani, by utilizing additional clustering validation techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of improved hierarchical clustering algorithm techniques (Rani Section II).
Regarding Claim 7, Claim 13 and Claim 19,
further comprising utilizing the forecast of demand of the first product to calculate inventory needs of the first product within a supply chain of an enterprise (Ray Par. 28-“ Forecasting is a key problem encountered in inventory planning. In order to buy inventory in advance, retailers would like an estimate of the number of units a distinct item for sale (also known as a stock keeping unit or a “SKU”) is going to sell in a certain time period. To clarify the difference between an item and a SKU, an item might be, for example, an iPad. But each specific configuration of an iPad (screen size, memory size, color, radio, and the like) is a different SKU. Each SKU typically has a unique identifier. Buying fewer units than is needed leads to lost sales opportunities, hence lower revenue, because items that could have been sold were not in stock. Buying too many units also can lead to lost sales opportunities because the cost of buying the unused inventory might not be compensated for by income from other sales to customers and can lead to lost opportunity costs (e.g., items that do not sell occupying space in a warehouse or store in place of items that could have been sold).”; Claim 11- receiving a plurality of vertices to be placed in clusters; choosing a plurality of initial medoids based on the plurality of vertices; assigning each of the plurality of vertices to one of the clusters based on a distance between each vertex of the plurality of vertices and a medoid closest to each vertex of the plurality of vertices; determining a quality of each of the clusters formed by a separate medoid and a separate set of closest vertices of the plurality of vertices to each separate medoid; moving one or more of the plurality of vertices to a different one of the clusters based on the quality of each cluster; assigning an unassigned vertex of the plurality of vertices to one of the clusters closest to the lone vertex; and ordering inventory based on the clusters.)

Regarding Claim 8,  
Ray teaches
A method for forecasting product demand for a product having limited historical sales data, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”;  Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data; (Ray- Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”; Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
...wherein each clustering model of the plurality of clustering models represents a cluster of products from among the first product and the plurality of different second products, ...(Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
...the prototype time series for each cluster corresponding to a medoid of the cluster(Ray Abstract; Par. 48-“ Another algorithm that attempts to solve that issue is partitioning around medoids (PAM). In PAM, vertices as chosen as the center point (or medoid). Each vertex is associated to the closest medoid. Then, for each medoid, the medoid is switched with a vertex to determine the total cost of the configuration. After each vertex has been switched with the medoid, the configuration with the lowest cost is chosen. Then new medoids are chosen based on the newly calculated configuration. This process is repeated until there is no change in medoids. While PAM is reliable and robust, it is very slow compared to other clustering methods, such as K-means, because the cost for each data point has to be calculated.”)
from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lies (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lies in generating a forecast of demand of the first product (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).
Ray teaches times series data analysis for sales forecasting and the feature is expounded upon by Bai:
for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 
Ray and Bai are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray, as taught by Bai, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray with the motivation of saving forecast time and cost (Bai Section I).
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Rani:
for each dynamically time warped dataset, performing a clustering analysis to obtain a clustering model with an optimal number of clusters, wherein the clustering analysis includes: applying a plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets ... and each clustering model generates a different number of clusters  (Rani Section I-“ Categories [1] of Time-Series Clustering Algorithms include: Temporal-Proximity-Based Clustering Procedures that work with unanalyzed facts and figures, either in frequency domain or time domain. It is also known as raw-data-based approach. It uses distance measures which consider temporal relations, e.g. Dynamic Time Warping, Temporal K-means/Hierarchical Clustering. When a set of features are extracted from raw data and any of the existing clustering algorithm e.g. [3] “Discrete Wavelet Transform”, “Adaptive Piecewise Constant Approximation”, and “Curvature-Based PCA Segments” etc. is applied on the feature space, clustering is known as representation or feature based clustering. Model-Based Clustering uses clusters of temporal data which are specified by a mixture of dynamic models ego Hidden-Markov-Model (HMM), Auto Regressive Moving Average Model (ARMA) identifies the data dependency and regularity behind the dynamic behaviors of time-related data.”; Section II- “ Step 1Start by assigning each item to its own cluster, so that if there are N items, there are N clusters, each containing just one item.; Step2 Use Dynamic Time Warping as time-series similarity measure, and then let the similarities between the clusters equal the similarities between the items they contain.
Step3 Find most similar pair of clusters and merge them into a single cluster, so that now one cluster can be reduced.
Step4 Compute the average linkage as similarities between the new cluster and each of the old clusters. Step5 Repeat steps 3 and 4 until get K clusters. Step6 Adopt inter/intra-cluster-distance-based swap to refine the K clusters from step 5 and then get the new K clusters.”; Section I-D-“ D. Inter/Intra Cluster-Distance-Based-Swap- This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output.”; Section II-C; Table 1; Table 2 and related text.”); 
and identifying the clustering model with the optimal number of clusters from among the plurality of clustering models based, at least in part, on a cluster validity index generated for each clustering model; (Rani Section 1.C-E- “ “CVAP (Cluster Validity and Analysis Platform)” [5] is a MATLAB based visual cluster validation tool. It gives essential tools and suitable environment for evaluating the validity or reasonability of clustering algorithms. It has four “External validity indices”, fourteen “internal validity indices” and five clustering algorithms like “K-means”, PAM, “hierarchical clustering” etc. It also acknowledges other clustering algorithms by loading a solution file having class labels, or by addition of new codes, and Pearson correlation coefficient, and similarity metrics of Euclidean distance are supported. D. Inter/Intra Cluster-Distance-Based-Swap -This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output. E. Evaluation Criteria “Cluster validation” is essential and mandatory procedure in analyzing clusters. There are various validation indices in CVAP to give measures for validity for every cluster. The validation indices [5] also give an explicit scenario on the optimal number of clusters. “)
Ray, Bai, and Rani are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Rani, by utilizing additional clustering validation techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of improved hierarchical clustering algorithm techniques (Rani Section II).

Ray in view of Bai in further view of Rani teach time series data analysis and the following feature is expounded upon by Panda:
for each dynamically time warped dataset, performing a partitional clustering analysis to obtain a clustering model with an optimal number of clusters (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 23-“ The time series is generated from the data set and then a DTW distance matrix is calculated using DTW technique. The DTW technique enables measuring similarity between two temporal sequences (time series data) which may vary in time or speed. For instance, similarities in walking patterns can be detected using DTW, even if one person walks faster than the other, or if there is any accelerations and deceleration during the course of an observation. DTW allows for non-linear alignments between time series not necessarily of the same length, as shown in FIG. 2C. In general, DTW is an approach that calculates an optimal match between two given (time dependent) sequences under certain restrictions.”; Par. 41-43-“ Time optimization achieved by the method proposed is explained with help of an example: Assumption: No of clusters or number of branches: n with each cluster having ‘p’ time series (TSs). The repository has ‘k’ TS models with average time to fit the model for one TS=0.8 min”);
 for each cluster within the clustering model with the optimal number of clusters define a prototype time series, ...; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).
Regarding Claim 14,  
Ray teaches
A method for forecasting product demand, for a product having limited historical sales data, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product, and the time series sales data of the first product being of a limited duration and/or representing sparse sales data ; (Ray- Par. 32-“For example, a possible situation can occur with electronic commerce (“eCommerce”) retailers. Because eCommerce retailers generally store more SKUs than brick and mortar stores, there might not be enough sales data to model each SKU separately. In addition, eCommerce retailers often stock SKUs that are short-lived or have erratic data.”; Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
the prototype time series for each cluster corresponding to a medoid of the cluster(Ray Abstract; Par. 48-“ Another algorithm that attempts to solve that issue is partitioning around medoids (PAM). In PAM, vertices as chosen as the center point (or medoid). Each vertex is associated to the closest medoid. Then, for each medoid, the medoid is switched with a vertex to determine the total cost of the configuration. After each vertex has been switched with the medoid, the configuration with the lowest cost is chosen. Then new medoids are chosen based on the newly calculated configuration. This process is repeated until there is no change in medoids. While PAM is reliable and robust, it is very slow compared to other clustering methods, such as K-means, because the cost for each data point has to be calculated.”)
determining within which cluster of the one clustering model the time series sales data for the first product lies; (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lies in generating a forecast of demand of the first product. (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).
Ray teaches times series data analysis for sales forecasting and the feature is expounded upon by Bai:
for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 
Ray and Bai are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray, as taught by Bai, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray with the motivation of saving forecast time and cost (Bai Section I).
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Rani:
for each dynamically time warped dataset, performing a partitional clustering analysis to obtain a clustering model with an optimal number of clusters, wherein the partitional clustering analysis includes; applying the plurality of clustering models to each dynamically time warped dataset to generate clusters of dynamically time warped datasets, wherein each of the plurality of clustering models generate a different number of clusters;  (Rani Section I-“ Categories [1] of Time-Series Clustering Algorithms include: Temporal-Proximity-Based Clustering Procedures that work with unanalyzed facts and figures, either in frequency domain or time domain. It is also known as raw-data-based approach. It uses distance measures which consider temporal relations, e.g. Dynamic Time Warping, Temporal K-means/Hierarchical Clustering. When a set of features are extracted from raw data and any of the existing clustering algorithm e.g. [3] “Discrete Wavelet Transform”, “Adaptive Piecewise Constant Approximation”, and “Curvature-Based PCA Segments” etc. is applied on the feature space, clustering is known as representation or feature based clustering. Model-Based Clustering uses clusters of temporal data which are specified by a mixture of dynamic models ego Hidden-Markov-Model (HMM), Auto Regressive Moving Average Model (ARMA) identifies the data dependency and regularity behind the dynamic behaviors of time-related data.”; Section II- “ Step 1Start by assigning each item to its own cluster, so that if there are N items, there are N clusters, each containing just one item.
Step2 Use Dynamic Time Warping as time-series similarity measure, and then let the similarities between the clusters equal the similarities between the items they contain.
Step3 Find most similar pair of clusters and merge them into a single cluster, so that now one cluster can be reduced.
Step4 Compute the average linkage as similarities between the new cluster and each of the old clusters. Step5 Repeat steps 3 and 4 until get K clusters.
Step6 Adopt inter/intra-cluster-distance-based swap to refine the K clusters from step 5 and then get the new K clusters.”; Section I-D-“ D. Inter/Intra Cluster-Distance-Based-Swap- This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output.”; Section II-C; Table 1; Table 2 and related text.”); 
and for each of the plurality of clustering models, applying a cluster validity analysis to obtain one clustering model of the plurality of clustering models with an optimal number of clusters; (Rani Section 1.C-E- “ “CVAP (Cluster Validity and Analysis Platform)” [5] is a MATLAB based visual cluster validation tool. It gives essential tools and suitable environment for evaluating the validity or reasonability of clustering algorithms. It has four “External validity indices”, fourteen “internal validity indices” and five clustering algorithms like “K-means”, PAM, “hierarchical clustering” etc. It also acknowledges other clustering algorithms by loading a solution file having class labels, or by addition of new codes, and Pearson correlation coefficient, and similarity metrics of Euclidean distance are supported. D. Inter/Intra Cluster-Distance-Based-Swap -This approach is used to refine the output of hierarchical clustering method and also it removes the inability of hierarchical clustering. Once we get K clusters as an output of hierarchical clustering, we then apply the inter/intra-cluster-distance-based swap to improvise these K clusters and attain new number of “clusters” as output. E. Evaluation Criteria “Cluster validation” is essential and mandatory procedure in analyzing clusters. There are various validation indices in CVAP to give measures for validity for every cluster. The validation indices [5] also give an explicit scenario on the optimal number of clusters. “)
Ray, Bai, and Rani are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Rani, by utilizing additional clustering validation techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of improved hierarchical clustering algorithm techniques (Rani Section II).

Ray in view of Bai in view of Rani teach time series data analysis and the following feature is expounded upon by Panda:
for each cluster within the one clustering model, define a prototype time series; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).
Claims 2, 9 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], in further view of Rani, "Modified hierarchical clustering algorithm for time series data," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 4036-4040, [hereinafter Rani] ,in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda], and in further view of Malyack et al., US Publication No. 20190156253 A1, [hereinafter Malyack].
Regarding Claim 2, Claim 9 and Claim 15,
Rai in view of Bai in further view of Panda teach The method of claim 1,…, The method of claim 8,…, and The method of claim 14,…
Ray in view of Bai in further view of Rani in further view of Panda teach time series data analysis and the following feature is expounded upon by Malyack:
further comprising z-scoring the time series data. (Malyack Par. 37-“ The term “volume information units” refers to a set of data that has been normalized (e.g., via Z-score, min max, etc.) and/or parsed within a larger pool of volume forecast data. “; Par.108- “Additionally or alternatively, for a selected period of time to forecast, one can selectively feed features representative of time series volumes that are being labeled as similar to the selected period of time based on geographical data, weather reports, political events, traffic data, etc.”; Par. 120-“ In some examples, the training engine 702 comprises a normalization module 706 and a feature extraction module 704. The normalization module 706, in some examples, may be configured to normalize (e.g., via Z-score methods) the historical data so as to enable different data sets to be compared. Normalization is the process of changing one or more values in a data set (e.g., the volume forecast data management tool 715) to a common scale while maintaining the general distribution and ratios in the data set. In this way, although values are changed, differences between actual values in the data set are not distorted such that information is not lost. For example, values from the volume forecast data management tool 715 may range from 0 to 100,000. The extreme difference in this scale may cause problems when combining these values into the same features for modeling. In an example illustration, this range can be changed to a scale of 0-1 or represent the values as percentile ranks, as opposed to absolute values.”)
Ray, Bai, Rani, Panda and Malyack are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani in further view of Panda, as taught by Malyack, by utilizing additional statistical techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani in further view of Panda with the motivation of improving accuracy of the volume forecast (Malyack Par.129).

Claims 4-6, 10-12 and 16-18 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], in further view of Rani, "Modified hierarchical clustering algorithm for time series data," 2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), 2016, pp. 4036-4040, [hereinafter Rani], in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda], and in further view of Chien, US Publication No. 20150302432 A1, [hereinafter Chien].
Regarding Claim 4, Claim 10 and Claim 16,
Rai in view of Bai in further view of Rani in further view of Panda teach The method of claim 1,…, The method of claim 8,…, and The method of claim 14,…
wherein the first product is from a pre-defined multi-level product hierarchy  (Chien Par. 49-50- “FIG. 4 illustrates an example of a block diagram 400 of a process for demand classification. Classification module 211 can take time series information, hierarchical information, and configuration information as input at 402. At 404, classification module 211 may process the time series using a user-defined class-by-variable. At 406, the classification module 211 can produce outputs for each group including, but not limited to, the classification results, demand specific statistics, and the derived information based on a user's selection. Classification module 211 may merge the outputs with the original input data at 408. At 410, potentially each time series may be assigned preliminary classification results, time series statistics, and derived information related to the time series. FIG. 5 illustrates an additional example of a block diagram 500 of a process for demand classification. At 502, a classification module 211 may first take input information to conduct a preliminary demand classification at a user-defined CLASS_HIGH level at 504 and a CLASS_LOW level at 506, respectively. “;Par. 68-70)
Ray, Bai, Rani, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani in further view of Panda with the motivation of improving accuracy and efficiency of demand forecasting processes can improve overall sales and operational planning effectiveness (Chien Par.3).
Regarding Claim 5, Claim 11 and Claim 17,
Rai in view of Bai in further view of Rani in further view of Panda teach The method of claim 1,…, The method of claim 10,…, and The method of claim 16,…
Ray in view of Bai in further view of Rani in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the plurality of different second products are selected from the predefined multi-level product hierarchy. (Chien Par. 49-50- “FIG. 4 illustrates an example of a block diagram 400 of a process for demand classification. Classification module 211 can take time series information, hierarchical information, and configuration information as input at 402. At 404, classification module 211 may process the time series using a user-defined class-by-variable. At 406, the classification module 211 can produce outputs for each group including, but not limited to, the classification results, demand specific statistics, and the derived information based on a user's selection. Classification module 211 may merge the outputs with the original input data at 408. At 410, potentially each time series may be assigned preliminary classification results, time series statistics, and derived information related to the time series. FIG. 5 illustrates an additional example of a block diagram 500 of a process for demand classification. At 502, a classification module 211 may first take input information to conduct a preliminary demand classification at a user-defined CLASS_HIGH level at 504 and a CLASS_LOW level at 506, respectively. “;Par. 68-70)
Ray, Bai, Rani, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani in further view of Panda with the motivation of improving accuracy and efficiency of demand forecasting processes can improve overall sales and operational planning effectiveness (Chien Par.3).
Regarding Claim 6 and Claim 12,
Rai in view of Bai in further view of Rani in further view of Panda in further view of Chien teach The method of claim 5,…, and The method of claim 11,…
Ray in view of Bai in further view of Rani in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the clustering analysis is based on: first hierarchical data attributes associated with the first product, second hierarchical data attributes associated with each of the plurality of second products, first time series data attributes associated with the time series sales data for the first product, and second time series data attributes associated with the time series sales data for the plurality of second products, wherein the pre-defined multi-level product hierarchy comprises a plurality of levels, each level of the plurality of levels corresponding to a hierarchical data attribute, wherein the first time series data attributes are specific to the times of sale of the first product, and the second time series data attributes are specific to the times of sale of each of the plurality of second products. (Chien Par. 38-39-[hierarchical data attributes] =“The classification module 211 can classify each demand time series based on characteristics such as demand lifecycle, intermittence, and seasonality. A “demand time series,” as used herein, is intended to refer to a time series in which data points represent a degree of demand of an item offered for sale. The classification results (e.g., demand time series statistics) can be output to users to enable the users to apply appropriate modeling techniques to each demand time series.”; Par. 43-45-[time-series attributes]-“FIG. 3 illustrates an example of a block diagram 300 of a process sequence for classifying, clustering, and hierarchical grouping one or more time series.. At 302, the classification module (e.g., the classification module 211 of the DCS engine 209) may classify each time series at specified level(s) into different classes, generate statistics of each of the demand series, and derive information about the demand characteristics for the time series. After the demand classes are ascertained for a time series, the pattern-clustering process can be executed for each class of time series at 304. The pattern-clustering module (e.g., the pattern-clustering module 213 of the DCS engine 209) may generate a pattern attribute that is used to cluster the demand series. Demand series with the same, or similar, demand characteristic may be grouped together and clusters may be formed. Volume group 308 and volume group 310 may be generated at 306 within the scope defined by the classification module 211 and the pattern-clustering module 213. In at least one embodiment, each volume group may be a group of nodes where the volume of an aggregated demand satisfies a minimum threshold. The volume-grouping module groups demand series with the same forecast reconciliation levels..”; -;Par. 57; Par. 86-87)
Ray, Bai, Rani, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani in further view of Panda with the motivation of improving accuracy and efficiency of demand forecasting processes can improve overall sales and operational planning effectiveness (Chien Par.3).
Regarding Claim 18,
Rai in view of Bai in further view of Rani in further view of Panda in further view of Chien teach The method of claim 14,…
Ray in view of Bai in further view of Rani in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the first product and the plurality of different second products are selected from different levels of the pre-defined multi-level product hierarchy  (Chien Par. 49-50- “FIG. 4 illustrates an example of a block diagram 400 of a process for demand classification. Classification module 211 can take time series information, hierarchical information, and configuration information as input at 402. At 404, classification module 211 may process the time series using a user-defined class-by-variable. At 406, the classification module 211 can produce outputs for each group including, but not limited to, the classification results, demand specific statistics, and the derived information based on a user's selection. Classification module 211 may merge the outputs with the original input data at 408. At 410, potentially each time series may be assigned preliminary classification results, time series statistics, and derived information related to the time series. FIG. 5 illustrates an additional example of a block diagram 500 of a process for demand classification. At 502, a classification module 211 may first take input information to conduct a preliminary demand classification at a user-defined CLASS_HIGH level at 504 and a CLASS_LOW level at 506, respectively. “;Par. 68-70)
Ray, Bai, Rani, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Rani in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Rani in further view of Panda with the motivation of improving accuracy and efficiency of demand forecasting processes can improve overall sales and operational planning effectiveness (Chien Par.3).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US Publication No. 20200104771A1 to Popescu et al.- Abstract-“ Embodiments select demand forecast parameters for a demand model for a first item. Embodiments receive historical sales data for a plurality of items on a per store basis and receive a plurality of seasonality curves for the first item of the plurality of items, each seasonality curve corresponding to a different pooling level for the first item. Embodiments determine a correlation for each of the seasonality curves at each pooling level and determine a root mean squared error (“RMSE”) for each determined correlation. Embodiments determine a score for each pooling level, the score based on the corresponding correlation, RMSE and a penalty and select one of the seasonality curves based on the determined scores. Embodiments use the demand model and the selected seasonality curve to determine a demand forecast for the first item, the demand forecast including a prediction of future sales data for the first item.”
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a). 
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chesiree Walton, whose telephone number is (571) 272-5219.  The examiner can normally be reached from Monday to Friday between 8 AM and 5 PM.  If any attempt to reach the examiner by telephone is unsuccessful, the examiner’s supervisor, Patricia Munson, can be reached at (571) 270-5396.  The fax telephone numbers for this group are either (571) 273-8300 or (703) 872-9326 (for official communications including After Final communications labeled “Box AF”).
	Another resource that is available to applicants is the Patent Application Information Retrieval (PAIR). Information regarding the status of an application can be obtained from the (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAX. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, please feel free to contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
	Applicants are invited to contact the Office to schedule an in-person interview to discuss and resolve the issues set forth in this Office Action.  Although an interview is not required, the Office believes that an interview can be of use to resolve any issues related to a patent application in an efficient and prompt manner.
Sincerely,
/Chesiree Walton/
Examiner, Art Unit 3624

/CRYSTOL STEWART/Primary Examiner, Art Unit 3624