Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 
Notice to Applicant
Claims 1- 19 have been examined in this application. This communication is the first action on the merits. Information Disclosure Statement (IDS) filed on 3/13/2020 has been acknowledged. 
Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1- 19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1-19 are directed to a method for forecasting product demand.
Claim 1 recites a method for forecasting product demand, which includes receiving time series sales data for a first product; receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product; for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; for each dynamically time warped dataset, performing a clustering analysis to obtain a clustering model with an optimal number of clusters; for each cluster within the clustering model with the optimal number of clusters 

As drafted, this is, under its broadest reasonable interpretation, within the Abstract idea grouping of “Methods of Organizing Human Activity- sales activities and Mathematical Concepts- mathematical calculations.  Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. There are no the additional elements to integrate the abstract idea into a practical application. The claims also fail to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, and/or an additional element applies or uses the judicial  exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. 
Dependent Claims 2-7, 9-13 and 15-19 recite the additional elements z-scoring the time series data ; performing the clustering analysis includes applying a cluster validity analysis to obtain the clustering model with the optimal number of clusters; the first product comprises a product with limited sales data; the time series sale data of the plurality of different second products are selected from predefined multi-level product hierarchy; the time series sales data of the plurality of different second products are selected from a single level of the multi-level product hierarchy; the time series sales data of the plurality of different second products are selected from a plurality of levels of the multi-level product hierarchy; and further narrowing the abstract idea. These recited limitations in the dependent claims do not amount to significantly more than the above-identified judicial exceptions in Claims 1, 8 and 14. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of 
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1,3-4, 8,10 and 14, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], and in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda].
Regarding Claim 1,  
Ray teaches
A method for forecasting product demand, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product; (Ray- Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lies (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lies as the forecast for product of demand of the first product (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).

for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 
Ray and Bai are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray, as taught by Bai, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray with the motivation of saving forecast time and cost (Bai Section I).
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Panda:
for each dynamically time warped dataset, performing a clustering analysis to obtain a clustering model with an optimal number of clusters (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”; Par. 23-“ The time series is generated from the data set and then a DTW distance matrix is calculated using DTW technique. The DTW technique enables measuring similarity between two temporal sequences (time series data) which may vary in time or speed. For instance, similarities in walking patterns can be detected using DTW, even if one person walks faster than the other, or if there is any accelerations and deceleration during the course of an observation. DTW allows for non-linear alignments between time series not necessarily of the same length, as shown in FIG. 2C. In general, DTW is an approach that calculates an optimal match between two given (time dependent) sequences under certain restrictions.”; Par. 41-43-“ Time optimization achieved by the method proposed is explained with help of an example: Assumption: No of clusters or number of branches: n with each cluster having ‘p’ time series (TSs). The repository has ‘k’ TS models with average time to fit the model for one TS=0.8 min”);
 for each cluster within the clustering model with the optimal number of clusters define a prototype time series; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed 
Regarding Claim 3,
Rai in view of Bai in further view of Panda teach The method of claim 1,…
Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Panda:
wherein performing the clustering analysis includes applying a cluster validity analysis to obtain the clustering model with the optimal number of clusters. (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”)
Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).

Regarding Claim 4, Claim 10 and Claim 16,
Rai in view of Bai in further view of Panda teach The method of claim 1,…, The method of claim 8,…, and The method of claim 14,…
wherein the first product comprises a product with limited sales data (Ray - Par. 28-29; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item.”).
Regarding Claim 8,  
Ray teaches
A method for forecasting product demand, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product; (Ray- Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
from the clustering model with the optimal number of clusters, determining within which cluster the time series sales data for the first product lies (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lies as the forecast for product of demand of the first product (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).
Ray teaches times series data analysis for sales forecasting and the feature is expounded upon by Bai:
for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 
Ray and Bai are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray, as taught by Bai, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray with the motivation of saving forecast time and cost (Bai Section I).


for each dynamically time warped dataset, performing a partitional clustering analysis to obtain a clustering model with an optimal number of clusters (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 23-“ The time series is generated from the data set and then a DTW distance matrix is calculated using DTW technique. The DTW technique enables measuring similarity between two temporal sequences (time series data) which may vary in time or speed. For instance, similarities in walking patterns can be detected using DTW, even if one person walks faster than the other, or if there is any accelerations and deceleration during the course of an observation. DTW allows for non-linear alignments between time series not necessarily of the same length, as shown in FIG. 2C. In general, DTW is an approach that calculates an optimal match between two given (time dependent) sequences under certain restrictions.”; Par. 41-43-“ Time optimization achieved by the method proposed is explained with help of an example: Assumption: No of clusters or number of branches: n with each cluster having ‘p’ time series (TSs). The repository has ‘k’ TS models with average time to fit the model for one TS=0.8 min”);
 for each cluster within the clustering model with the optimal number of clusters define a prototype time series; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).
Regarding Claim 14,  
Ray teaches
A method for forecasting product demand, comprising: receiving time series sales data for a first product; (Ray - Par. 28-29; Par. 33“FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”
receiving time series sales data of a plurality of different second products, the time series sales data of each of the different second products being longer than the time series sales data of the first product; (Ray- Par. 33-“ FIG. 4B illustrates three different SKUs that have such situations. The same X-axis 420 and Y-axis 410 that are present in FIG. 4A also are present in FIG. 4B. Data series 440, data series 450, and data series 460 represent the sales of three different items. Data series 440 has incomplete data. Sales are present for only a very short time period, with no sales before or after that time period. This type of data series can be indicative of a short-lived item. Because the item had sales only for a very short-period of time, a popular but short-lived item might be indicative of a product that is no longer made. Data series 450 has two sales spikes, with a period of zero or otherwise low sales in between the sales spikes. Such a data series might be indicative of an item that could not keep up with demand (between the two spikes), and is no longer being made. Or such a data series might be indicative of a seasonal item (explaining the sales spikes) that is no longer being made (explaining the lack of data after the second sales spike). Data series 460 is similar to data series 440 in that it has only a single spike. However, while data series 440 is similar to data series 430 in that a peak for data series 430 roughly coincides with a peak of data series 440, data series 460 has a peak that roughly coincides with a trough of data series 430. This fact can indicate both that the item in data series 460 is a short-lived item and that its sales do not correlate well with the item represented by data series 430. This type of behavior is discussed in further detail below.”)
determining within which cluster of the one clustering model the time series sales data for the first product lies; (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”);
and utilizing the prototype time series of the cluster within which the time series sales data for the first product lies as the forecast for product of demand of the first product. (Ray Par. 34-“ One method of solving the above problems is to forecast items in groups (also known as clusters). In other words, instead of forecasting what each individual SKU will sell, one would place a SKU in a group with other SKUs. Then, one forecasts what the group of SKUs would sell. Data series 430, data series 440, and data series 450 could be forecast as a group. The forecast could then be used to order the proper number of items for each of the three SKUs. While there are currently existing methods and systems for grouping SKUs, it would be desirable to have a more accurate method and system of grouping SKUs for forecasting purposes.”; Par. 36-37-“In some traditional notions of grouping or clustering, there can be a requirement to place similar SKUs in the same groups. Thus, two similar items would not be placed in separate groups. However, in some embodiments, it is more important that dissimilar SKUs are not placed in the same group; similar items can be placed in separate groups, and embodiments will still operate correctly. Returning to FIG. 4B, an example of dissimilar SKUs is seen in data series 430 of FIG. 4A and data series 460. As explained above, while data series 430 goes down, data series 460 goes up. This fact can be an indication that placing the item represented in data series 430 in a group with the item represented in data series 460 might not be ideal.”).
Ray teaches times series data analysis for sales forecasting and the feature is expounded upon by Bai:
for each of the different second products, dynamically time warping the time series sales data of the first product with the respective time series sales data of the respective second product to create a dynamically time warped dataset; (Bai –Section I- “In this study we present a sales forecast case of a construction machine manufacturer in China, which produces 120 kinds of products and keeps a large stock of spare parts and finished goods. The inventory is a burden. Aim for alleviating the inventory, we plan the production and control the spare parts inventory in term of sales forecast. “; Section II-“ The ARIMA model identification need to be supervised for determining the orders of model, that is, each time series forecasting has a procedure for model identification. If we classify each time series to appropriate model in advance, we can save forecast time and cost. Two approaches are used to evaluate the similarity or dissimilarity between time series in research articles and proceedings: one is model-based, the other is model-free. The model-based approach consists in projecting time series into a given functional basis space which corresponds to a polynomial, ARIMA, or a discrete Fourier transform approximation. The proximity between time series is then evaluated by the fitted basis coefficients [9]–[10][11]. The model-free approach is non-parametric and consists in evaluating the similarity between time series based on their initial temporal description. Within the scope of non-parametric approach, the mostly widely used proximity measures between time series are Euclidean distance and the dynamic time warping [12]. Furthermore, SVM and forecast density have also been used to classify the time series”; 

Ray in view of Bai teach time series data analysis and the following feature is expounded upon by Panda:
for each dynamically time warped dataset, performing a partitional clustering analysis to produce a plurality of clustering models (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 27-“ In an embodiment, of the present disclosure, at step 402, the one or more processors 102 in conjunction with the model fitting module 108 are configured to clustering the plurality of time series into hierarchical time series clusters based on the DTW distance measure. For each branch of the hierarchical time series clusters, at step 404, the one or more processors 102 in conjunction with the model fitting module 108 are configured to identify a first time series among the plurality of time series placed at a cluster height equal to a lowest cluster height of the branch or cluster of the hierarchical time series clusters. At step 406, the one or more processors 102 in conjunction with the model fitting module 108 are configured to determine a best fit model, from a plurality of time series models, for the first time series that provides an error below an Error Tolerance (ET) threshold. Fitting model to the selected time series can be performed using available model fitting techniques defined for the plurality of time series models in the repository. The plurality of time series models can be selected and stored in a repository, for example in the memory 104.”);
for each clustering model, applying a cluster validity analysis to obtain one clustering model with the optimal number of clusters (Panda Par. 17-“ The embodiments herein provide a method and system for model fitting to hierarchical time series clusters. A plurality of time series to be analyzed are clustered as hierarchical time series clusters using a Dynamic Time Warping (DTW) as optimal distance measure to create time series hierarchical clusters. The method disclosed recognizes least dissimilarity time series in the hierarchical time series clusters, a best fit model is identified for the time series and the same model is continued up the hierarchy along the branch of the hierarchical time series clusters till the model identified satisfies Error Tolerance (ET) and Error Difference (ED) criteria. The method reduces the model fitting time, also referred as model building time by more than 50%. The same is explained with an example while describing the method flow. The time efficiency obtained in model fitting of time series is critical while processing millions of time series. Thus higher the time efficiency faster is the forecasting of the time series to get insights from the data gathered.”)
 for each cluster within the one clustering model, define a prototype time series; (Panda Par. 4-“ Each time series within the hierarchical time series cluster has its individual characteristics that varies in accordance with the hierarchical level where the time series lies in the hierarchical time series cluster. Some existing approaches provide insights on model fitting to hierarchical time series where a global or single model is identified for the hierarchical time series that is then used forecasting of the hierarchical time series. However, with a single model to the hierarchical time series provides a generalized or common model for all the time series within the hierarchical time series. Thus, the existing method tries to generalize all the time series of the cluster and may lose individual characteristics of each time series is effectively reduces accuracy of rightly capturing each time series with its individual characteristics. However, a good balance needs to be sought between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy.”; Par. 22-“ FIG. 2A through FIG. 2D illustrate an example for hierarchical clustering of a plurality of time series into hierarchical time series clusters, in accordance with an embodiment of the present disclosure. Time series clustering is to partition the plurality of time series (time series data) into different groups based on similarity or distance, such that time series or TS in the same cluster are more similar. One of the key component in TS clustering is the function used to measure the similarity between two time series being compared. Practically, the time series data captured could be in various forms including raw values of equal or unequal length, vectors of feature-value pairs, transition matrices, and so on. Thus, to cluster the time series, the DTW distance is utilized for generating hierarchical time series clusters. FIG. 2A and FIG. 2B depicts data set for the plurality of time series.”; Par. 41-43)

Ray, Bai and Panda are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai, as taught by Panda, by utilizing additional modeling techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai with the motivation of providing a good balance between identifying best fit model for each time series and identifying common best fit models for plurality of series so as to achieve good time efficiency during model fitting along with good forecast accuracy (Panda Par.4).
Claims 2, 9 and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda], and in further view of Malyack et al., US Publication No. 20190156253 A1, [hereinafter Malyack].
Regarding Claim 5, Claim 11 and Claim 17,
Rai in view of Bai in further view of Panda teach The method of claim 1,…, The method of claim 8,…, and The method of claim 14,…

further comprising z-scoring the time series data. (Malyack Par. 37-“ The term “volume information units” refers to a set of data that has been normalized (e.g., via Z-score, min max, etc.) and/or parsed within a larger pool of volume forecast data. “; Par.108- “Additionally or alternatively, for a selected period of time to forecast, one can selectively feed features representative of time series volumes that are being labeled as similar to the selected period of time based on geographical data, weather reports, political events, traffic data, etc.”; Par. 120-“ In some examples, the training engine 702 comprises a normalization module 706 and a feature extraction module 704. The normalization module 706, in some examples, may be configured to normalize (e.g., via Z-score methods) the historical data so as to enable different data sets to be compared. Normalization is the process of changing one or more values in a data set (e.g., the volume forecast data management tool 715) to a common scale while maintaining the general distribution and ratios in the data set. In this way, although values are changed, differences between actual values in the data set are not distorted such that information is not lost. For example, values from the volume forecast data management tool 715 may range from 0 to 100,000. The extreme difference in this scale may cause problems when combining these values into the same features for modeling. In an example illustration, this range can be changed to a scale of 0-1 or represent the values as percentile ranks, as opposed to absolute values.”)
.

Claims 5-7, 11-13 and 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Ray et al., US Publication No. 20160260111A1, [hereinafter Ray], in view of Bai et al. , Classification and Forecasting for Enterprise Data, Published in: 2018 Chinese Automation Congress (CAC) Date of Conference: 30 Nov.-2 Dec. 2018, [hereinafter Bai], in further view of Panda, US Publication No. 20190050763 A1, [hereinafter Panda], and in further view of Chien, US Publication No. 20150302432 A1, [hereinafter Chein].
Regarding Claim 5, Claim 11 and Claim 17,
Rai in view of Bai in further view of Panda teach The method of claim 1,…, The method of claim 8,…, and The method of claim 14,…
Ray in view of Bai in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the time series sale data of the plurality of different second products are selected from predefined multi-level product hierarchy. (Chien Par. 49-50- “FIG. 4 illustrates an example of a block diagram 400 of a process for demand classification. Classification module 211 can take time series information, hierarchical information, and configuration information as input at 402. At 404, classification module 211 may process the time series using a user-defined class-by-variable. At 406, the classification module 211 can produce outputs for each group including, but not limited to, the classification results, demand specific statistics, and the derived information based on a user's selection. Classification module 211 may merge the outputs with the original input data at 408. At 410, potentially each time series may be assigned preliminary classification results, time series statistics, and derived information related to the time series. FIG. 5 illustrates an additional example of a block diagram 500 of a process for demand classification. At 502, a classification module 211 may first take input information to conduct a preliminary demand classification at a user-defined CLASS_HIGH level at 504 and a CLASS_LOW level at 506, respectively. “;Par. 68-70)
Ray, Bai, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Panda with the 
Regarding Claim 6, Claim 12 and Claim 18,
Rai in view of Bai in further view of Panda in further view of Chien teach The method of claim 5,…, The method of claim 11,…, and The method of claim 14,…
Ray in view of Bai in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the time series sales data of the plurality of different second products are selected from a single level of the multi-level product hierarchy. (Chien Par. 86-87-“ Volume-grouping module 215 may generate a number of volume groups. These volume groups may be generated based on the user-specified volume threshold, which can be based on the demand averages. A user can define a level in the hierarchy as the lowest grouping level. Starting from the lowest grouping level, if a series has sufficient volume, then a forecast may be generated at the lowest level to capture any series-specific patterns. Otherwise, the series may be aggregated to one level higher via the input hierarchy with other low volume series until it reaches a level with sufficient volume, or alternatively, it reaches the top level. The process of volume-grouping can be run stand-alone, or after classification and pattern-clustering. Volume-grouping module 215 may generate forecasts at a volume-group level and disaggregate data down to lowest level. Two hierarchy-based volume-grouping types utilized by volume-grouping module 215 include dynamic grouping and dynamic grouping with hierarchy restriction.”; Par. 57)

Regarding Claim 7, Claim 13 and Claim 19,
Rai in view of Bai in further view of Panda in further view of Chien teach The method of claim 5,…, The method of claim 11,…, and The method of claim 14,…
Ray in view of Bai in further view of Panda teach time series data analysis and the following feature is expounded upon by Chien:
wherein the time series sales data of the plurality of different second products are selected from a plurality of levels of the multi-level product hierarchy. (Chien Par. 68-70-“ For example, winter clothes and summer swimming suits can both be short time-span products, but these products may have different demand patterns. Forecasting these products together may lead to inaccuracies due to the differing demand patterns. Forecasting the products separately, however, can ensure that the correct seasonality is considered. In at least one example, demand series with similar patterns may be clustered together for each “long time-span seasonal” and “short time-span” time series. Various techniques can be used for clustering. For example, hierarchical clustering, K-means clustering, or a combination of the two may be used to cluster demand series with other time series having the same, or similar, demand patterns. Hierarchical clustering can automatically determine an optimal number of clusters. However, hierarchical clustering may produce performance issues especially when the number of items to cluster exceeds a certain limit. K-means methods are computationally efficient. However, K-means methods may involve having to pre-specify a number of clusters. Thus, a hybrid process may be considered that combines the two methods to make use of the advantages of each method.”; Par. 102-“ When no more time series exist in the forecast hierarchy, the flow ay proceed to block 2314 where a second forecast hierarchy may be generated (e.g., by DCS engine 209 of FIG. 2). Though the flow depicts generating the second forecast hierarchy as a final step, it should be understood that the second forecast hierarchy may alternatively be incrementally generated at any point between block 2302 and block 2310. In at least one example, generation of the second forecast hierarchy may include associating classification data, pattern group data, or aggregation data to a node of the first forecast hierarchy. Alternatively, generation of the second forecast hierarchy may include modifying metadata related to each node, or time series, included in the first forecast hierarchy. Further, a second forecast hierarchy, separate from the first forecast hierarchy may be generated, the second forecast hierarchy having a different arrangement of nodes based on at least one of the classification data, the pattern group data, or the aggregation data associated with each time series included in the second forecast hierarchy.”; Par. 57)
Ray, Bai, Panda and Chien are directed to time series data analysis. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improve upon model analysis of Ray in view of Bai in further view of Panda, as taught by Chien, by utilizing additional clustering techniques with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Ray in view of Bai in further view of Panda with the motivation of improving accuracy and efficiency of demand forecasting processes can improve overall sales and operational planning effectiveness (Chien Par.3).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US Publication No. 20200104771A1 to Popescu et al.- Abstract-“ Embodiments select demand forecast parameters for a demand model for a first item. Embodiments receive historical sales data for a plurality of items on a per store basis and receive a plurality of seasonality curves for the first item of the plurality of items, each seasonality curve corresponding to a different pooling level for the first item. Embodiments determine a correlation for each of the seasonality curves at each pooling level and determine a root mean squared error (“RMSE”) for each determined correlation. Embodiments determine a score for each pooling level, the score based on the corresponding correlation, RMSE and a penalty and select one of the seasonality curves based on the determined scores. Embodiments use the demand model and the selected seasonality curve to determine a demand forecast for the first item, the demand forecast including a prediction of future sales data for the first item.”
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chesiree Walton, whose telephone number is (571) 272-5219.  The examiner can normally be reached from Monday to Friday between 8 AM and 5 PM.  If any attempt to reach the examiner by telephone is unsuccessful, the examiner’s supervisor, Patricia Munson, can be reached at (571) 270-5396.  The fax telephone numbers for this group are either (571) 273-8300 or (703) 872-9326 (for official communications including After Final communications labeled “Box AF”).
	Another resource that is available to applicants is the Patent Application Information Retrieval (PAIR). Information regarding the status of an application can be obtained from the (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAX. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, please feel free to contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
	Applicants are invited to contact the Office to schedule an in-person interview to discuss and resolve the issues set forth in this Office Action.  Although an interview is not required, the Office believes that an interview can be of use to resolve any issues related to a patent application in an efficient and prompt manner.

/CHESIREE A WALTON/ Examiner, Art Unit 3624