Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Notice to Applicant
Claims 1-20 have been examined in this application. This communication is the first action on the merits. Information Disclosure Statement (IDS) filed on 7/30/2021 is acknowledged. 

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.

Claims 1- 20 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more. Claims 1-10 are directed to a method for managing product data, Claims 11-15 are directed to an apparatus for managing product data and Claims 16-20 are directed to an article of manufacture for managing product data.
Claim 1 recites a method for managing product data, Claim 11 recites an apparatus for managing product data and Claim 16 recites an article of manufacture for managing product data, which include generating a first time segment associated with a first product sale feature, the first time segment including product sales data related to sales of the product at a point of sale per a first time unit over a first time period in the past starting on a current time and of a first predetermined duration expressed in the first time unit, and one or more second time segments respectively associated with one or more second product sale features, the second time segments each comprising context data associated with product sale history or the point of sale over the first time period in the past; combining the first time segment with the one or more second time segments to generate an information vector of product sale features; and generating a prediction of sales for the product at the point of sale based on the output of a prediction model to which the information vector of product sale features is input.  As drafted, this is, under its broadest reasonable interpretation, within the Abstract idea grouping of “Methods of Organizing Human Activity”- marketing/sales activities and Mental Processes- Evaluation.  The recitation of  “processor” and “memory” does not take claims out of the certain methods of organizing human activity or mental processes grouping.  Accordingly, the claim recites an abstract idea.
This judicial exception is not integrated into a practical application. The claims primarily recite the additional element of using computer components to perform each step. The “processor” and “memory” is recited at a high-level of generality, such that it amounts no more than mere instructions to apply the exception using a computer component. See MPEP 2106.05(f). Accordingly, the additional elements do not integrate the abstract idea into a practical application because it does not impose any meaningful limits on practicing the abstract idea. The claims also fail to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, and/or an additional element applies or uses the judicial  exception in some other meaningful way beyond generally linking the use of the judicial exception to a particular technological environment, such that the claim as a whole is more than a drafting effort designed to monopolize the exception.  See 84 Fed. Reg. 55.  In particular, there is a lack of improvement to a computer or technical field in forecasting. 
The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of “processor” and “memory”  is insufficient to amount to significantly more. (See MPEP 2106.05(f) – Mere Instructions to Apply an Exception – “Thus, for example, claims that amount to nothing more than an instruction to apply the abstract idea using a generic computer do not render an abstract idea eligible.” Alice Corp., 134 S. Ct. at 235). Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. 
The claim fails to recite any improvements to another technology or technical field, improvements to the functioning of the computer itself, use of a particular machine, effecting a transformation or reduction of a particular article to a different state or thing, adding unconventional steps that confine the claim to a particular useful application, and/or meaningful limitations beyond generally linking the use of an abstract idea to a particular environment.  See 84 Fed. Reg. 55. Viewed individually or as a whole, these additional claim element(s) do not provide meaningful limitation(s) to transform the abstract idea into a patent eligible application of the abstract idea such that the claim(s) amounts to significantly more than the abstract idea itself.   With regards to receiving data and step 2B, it is M2106.05(d)- Receiving or transmitting data over a network, e.g., using the Internet to gather data, Symantec, 838 F.3d at 1321, 120 USPQ2d at 1362 (utilizing an intermediary computer to forward information).
Examiner concludes that the additional elements in combination fail to amount to significantly more than the abstract idea based on findings that each element merely performs the same function(s) in combination as each element performs separately. The claim is not patent eligible. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually.
Dependent Claims 2-10, 16-15 and 17-20 recite the additional limitations performing a plurality of prediction model training phases using a plurality of respective candidate prediction models; and selecting a candidate prediction model as the prediction model based on results of the prediction model training phases; generating a first time series associated with the first product sale feature, the first time series including product sales data related to sales of the product at the point of sale per the first time unit over a past historic duration expressed in the first time unit; generating one or more second time series respectively associated with the one or more second product sale features, the second time series each including context data associated with product sale history or the point of sale over the past historic duration expressed in the first time unit; for each of the first time series and the one or more second time series, generating a sequence of successive time segments, based on respective data of the first time series and the one or more second time series, wherein each time segment corresponds to a time slice of a first predetermined duration expressed in the first time unit in a sequence of successive time slices, wherein an initial time slice of the sequence of time slices starts at an initial time of the past historic duration, and a last time slice of the sequence of time slices ends at a final time of the past historic duration, and each time slice of the sequence of time slices but the last time slice is offset to the next time slice in the sequence by one first time unit; running training iterations of a vector generation loop, based on a current time slice which is incremented by an increment, wherein the vector generation loop is initiated with the current time slice set to the initial time slice of the past historic duration and stopped when the current time slice is set to a predetermined training phase time slice of the past historic duration; combining a first current time segment corresponding to the first time series with one or more second current time segments to generate a current information input vector of product sale features, wherein the first current time segment and the one or more current time segments correspond to the current time slice; and combining a first offset current time segment corresponding to the first time series with one or more second offset current time segments to generate a current information output vector of product sale features, wherein the first current time segment and the one or more current time segments correspond to the current time slice, and wherein the first offset current time segment and the one or more second offset current time segments correspond to a first offset current slice which is offset from the current time slice by a predetermined training offset; and training the prediction model based on the current information input vector of product sale features, and on the current information output vector of product sale features;  running test iterations of the vector generation loop, based on a test current time slice which is incremented by a test phase increment after each iteration, wherein the vector generation loop is initiated with the test current time slice set to a test initial time slice which is offset from the predetermined training phase time slice by one first time unit and stopped when the test current time slice is set to the last time slice of the sequence of time slices, the method comprising, for each test iteration: combining a third current time segment corresponding to the first time series with one or more fourth current time segments to generate a current information test input vector of product sale features, wherein the third current time segment and the one or more fourth current time segments correspond to the current time slice; combining a third offset current time segment corresponding to the first time series with one or more fourth offset current time segments to generate a current information test output vector of product sale features, wherein the third offset current time segment and the one or more fourth offset current time segments correspond to a second offset current slice which is offset from the test current time slice by a predetermined test offset; generating a test output of the prediction model by running the prediction model with the current information test input vector of product sale features as input; and calculating a test prediction error based on a comparison of the test output of the prediction model with the current information test output vector of product sale features; and updating the prediction model based on the test prediction error; a first data series of aggregate sales of the product at the point of sale per the first time unit over a first time period in the past starting on a current time and of the first predetermined duration; a second data series of aggregate sales of the product per a second time unit over a second time period in the past starting on the current time and of a second predetermined duration expressed in the second time unit; a first derivative data series of first derivatives of the first data series; a second derivative data series of second derivatives of the first data series; a third derivative data series of first derivatives of the second data series; a fourth derivative data series of second derivatives of the second data series; and a third data series of promotions for sale of the product at the point of sale per the first time unit over the first time period in the past starting on the current time; wherein the information vector of product sale features further comprises: a fourth data series of forecast promotions for sale of the product at the point of sale per the first time unit over the first time period in the future from the current time/ a 5th-8th data series of weather/calendar/etc.  data at the point of sale per the first time unit over the first time period in the past starting on the current time ; and further narrowing the abstract idea. These recited limitations in the dependent claims are mere instructions for applying the abstract idea on a computerized system which are operating such that they do not amount to significantly more than the above-identified judicial exceptions in Claims 1, 11 and 16. 
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:

A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Lei et al., US Publication No. 20200111109 A1 [hereinafter Lei], in view of Fan et al., US Publication No. 20200074274 A1 [hereinafter Fan]. 
Regarding Claim 1, 
Lei teaches
A computer-implemented method of managing product data, comprising: generating a first time segment associated with a first product sale feature, the first time segment including product sales data related to sales of the product at a point of sale per a first time unit over a first time period in the past starting on a current time and of a first predetermined duration expressed in the first time unit, and one or more second time segments respectively associated with one or more second product sale features, the second time segments each comprising context data associated with product sale history or the point of sale over the first time period in the past (Lei Par. 36-“ Embodiments are disclosed from the perspective that, for an item (i.e., a class of items such as yogurt or men's shirts) sold at a location (e.g., a retail location), the item may be promoted in various ways at various times (i.e., pre-defined retail periods, such as a day, week, month, year, etc.). A retail calendar has many retail periods (e.g., weeks) that are organized in a particular manner (e.g., four (4) thirteen (13) week quarters) over a typical calendar year. A retail period may occur in the past or in the future. Historical sales/performance data may include, for example, a number of units of an item sold in each of a plurality of past retail periods as well as associated promotion data (i.e., for each retail period, which promotions were in effect for that period) and any other relevant demand features/variables.”; Par. 28-30; Fig. 1); 
and generating a prediction of sales for the product at the point of sale based on the output of a prediction model to which the information vector of product sale features is input.; (Lei Par.40-“ In machine learning, SVMs are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.; Par. 50; Par. 63”)

Lei teaches predictive modeling and the feature is expounded upon by Fan:

combining the first time segment with the one or more second time segments to generate an information vector of product sale features  (Fan Par. 5;Par. 7-17;  Par. 18-19-“ In certain embodiments, the plurality of temporal scales comprises 2-10 scales. In certain embodiments, the plurality of temporal scales comprises four scales of scale-1, scale-3, scale-5, and scale-7; for each target step from the future time steps: the scale-1 uses hidden state of the target step; the scale-3 uses hidden states of the target step, one of the future time steps immediately before the target step, and one of the time steps immediately after the target step; the scale-5 uses hidden states of the target step, two of the future time steps immediately before the target step, and two of the time step immediately after the target step; and the scale-7 uses hidden states of the target step, three of the future time steps immediately before the target step, and three of the time step immediately after the target step.”);
Lee and Fan are directed to product sales predictive modelling. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon input data analysis of Lee, as taught by Fan, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Lee with the motivation of improving forecast accuracy. (Fan Par. 80).
Regarding Claim 2 and Claim 12 and Claim 17, Lei in view of Fan teach The method of claim 1, further comprising... , The apparatus of claim 11, wherein the method further comprises... and The non-transitory computer-readable medium of claim 16, wherein the method further comprises...
a prediction model qualification phase, the prediction model qualification phase comprising: performing a plurality of prediction model training phases using a plurality of respective candidate prediction models; (Lei Par. 10- “Embodiments forecasting future demand for an item. Embodiments receive historical sales data for the item that includes a plurality of data points and define a set of features for the item. Embodiments receive a regression based demand algorithm for the item that includes the set of features as regression variables and split the data points into a training set and a testing set. Embodiments assign each of the features of the set of features into one of a plurality of regularization categories and assign a penalty parameter to each of the features subject to regularization. Embodiments train the demand algorithm using the training set, the penalty parameters and the features to generate a trained demand model. Embodiments evaluate the trained demand model using the testing set to determine an early drop metric and repeat the assigning each of the features, the assigning the penalty parameter, the training the demand algorithm and the evaluating the trained demand model until the early drop metric meets a threshold. Embodiments use the trained demand model to determine a demand forecast for the item, the demand forecast including a prediction of future sales data for the item. Embodiments then electronically send the demand forecast to an inventory management system which is configured to generate shipments of additional quantities of the item to a plurality of retail stores based on the demand forecast.”)
and selecting a candidate prediction model as the prediction model based on results of the prediction model training phases (Lei Par. 17-“ In order to improved demand forecasting, retailers have begun to move to modern machine learning technologies, such as support vector machine (“SVM”), artificial neural network (“ANN”), random forest, and so on. However, typically a retailer will just pick one model for each product/location. As used herein, a retailer can include a single retail store, or can include a large amount of retail stores all integrated and managed by single or multiple logistic operations.”)
Regarding Claim 3 and Claim 13 and Claim 18, Lei in view of Fan teach The method of claim 1, further comprising... , The apparatus of claim 11, wherein the method further comprises... and The non-transitory computer-readable medium of claim 16, wherein the method further comprises...

Lei teaches predictive modelling and the feature is expounded upon by Fan:
a prediction model training phase, the prediction model training phase comprising: generating a first time series associated with the first product sale feature, the first time series including product sales data related to sales of the product at the point of sale per the first time unit over a past historic duration expressed in the first time unit (Fan Par. 7-17-“ In certain aspects, the present disclosure relates to a method for time series forecasting of a product. In certain embodiments, the method includes: providing input feature vectors of the product corresponding to a plurality of future time steps; performing bi-directional long-short term memory network (BiLSTM or Bi-LSTM) on the input feature vectors to obtain hidden states corresponding to the plurality of future time steps; for each future time step: performing temporal convolution on the hidden state using a plurality of temporal scales to obtain context features at the plurality of temporal scales, and summating the context features at the plurality of temporal scales using a plurality of weights to obtain multi-scale context features; and converting the multi-scale context features to obtain the time series forecasting corresponding to the future time steps. In certain embodiments, the step of providing input feature vectors includes: providing time series input variables corresponding to the plurality of future time steps of the product; embedding the time series input variables to feature vectors; and for each of the future time step of the product: concatenating the feature vectors of the time step to obtain a long vector; and forming one of the input feature vectors from the long vector using a fully-connected layer.”; Par. 5; Par. 19-21; Par. 81-84); 
generating one or more second time series respectively associated with the one or more second product sale features, the second time series each including context data associated with product sale history or the point of sale over the past historic duration expressed in the first time unit (Fan Par. 81-“ Referring back to FIG. 1, the time series forecasting application 118 further includes a user interface 180. The user interface 180 is configured to provide a use interface or graphic user interface in the computing device 110. In certain embodiments, the user is able to configure parameters for the training of the time series forecasting application 118, that is, parameters of the embedding module 120, the encoder 140, and the decoder 160. The user interface 180 may instruct using historical data, such as the last three years or five years daily input/product feature and output/product sales data to train the time series forecasting application 118, so as to obtain optimal parameters of the application 118. The user interface 180 may instruct using historical data, such as last month's daily data (both input/product feature and output/product sales) to the encoder 140 to obtain the most recent hidden state (last day of the last month), and using current data such as the coming month's daily data (only input/product feature) to the decoder 160 to obtain output/product forecast sales of the coming month.”; Par. 84); 
for each of the first time series and the one or more second time series, generating a sequence of successive time segments, based on respective data of the first time series and the one or more second time series, wherein each time segment corresponds to a time slice of a first predetermined duration expressed in the first time unit in a sequence of successive time slices, wherein an initial time slice of the sequence of time slices starts at an initial time of the past historic duration, and a last time slice of the sequence of time slices ends at a final time of the past historic duration, and each time slice of the sequence of time slices but the last time slice is offset to the next time slice in the sequence by one first time unit (Fan Par. 84-“ FIG. 3 schematically depicts training of a time series forecasting application according to certain embodiments of the present disclosure. In certain embodiments, the training of the application is performed by a computing device, such as the computing device 110 shown in FIG. 1. It should be particularly noted that, unless otherwise stated in the present disclosure, the steps of the training process or method may be arranged in a different sequential order, and are thus not limited to the sequential order as shown in FIG. 3. In this training process, a three year sales data of products are provided, and training for a product or namely a target product is illustrated as follows. The training for all the other products are the same or similar. In certain embodiments, we choose two months data from the three year data for training as follows, and the training using the two month's data is sufficient. In certain embodiments, data from the other months of the three year can be used similarly, so as to further train the application/model. Here the two months data are named the first month and the second month, which are sequential months and have 31 days and 30 days respectively.”); 
running training iterations of a vector generation loop, based on a current time slice which is incremented by an increment, wherein the vector generation loop is initiated with the current time slice set to the initial time slice of the past historic duration and stopped when the current time slice is set to a predetermined training phase time slice of the past historic duration, the method comprising, for each training iteration (Fan Par. 65 & Related text –“ The embedding layer 122 is configured to, upon retrieving or receiving the categorical variables, embed the categorical variables into numerical feature vectors. In certain embodiments, the embedding layer 122 is a neural network. The embedding layer 122 may retrieve or receive the categorical variables directly from the database 190 or via the user interface 180. In certain embodiments, the categorical variables are time series training data or historical data, or future data. The data may be categorical data of products on an e-commerce platform.”; Par. 141-“ TRAINING AND EVALUATION: the details of model training and evaluation are described as follows. There is a total of 6000 time series in the GOC2018 online-sales dataset. Each time series is split into the training and testing part. The training part starts from the beginnings of time series (as early as January 2016) to December 2017, and the testing part covers the 31 days of January 2018. In addition, randomly sampled sets of consecutive days in total of one fifth of the series length per time series are held out as validation series. During training time, we randomly sample a batch of 32 different time series for each iteration. For each sampled time series, we randomly pick a training creation date, then take Th steps past of creation date as the history and Tf steps after as the future, to form the final training sequence.”): 
combining a first current time segment corresponding to the first time series with one or more second current time segments to generate a current information input vector of product sale features, wherein the first current time segment and the one or more current time segments correspond to the current time slice (Fan Par. 18-19-“ In certain embodiments, the plurality of temporal scales comprises 2-10 scales. In certain embodiments, the plurality of temporal scales comprises four scales of scale-1, scale-3, scale-5, and scale-7; for each target step from the future time steps: the scale-1 uses hidden state of the target step; the scale-3 uses hidden states of the target step, one of the future time steps immediately before the target step, and one of the time steps immediately after the target step; the scale-5 uses hidden states of the target step, two of the future time steps immediately before the target step, and two of the time step immediately after the target step; and the scale-7 uses hidden states of the target step, three of the future time steps immediately before the target step, and three of the time step immediately after the target step.”); 
and combining a first offset current time segment corresponding to the first time series with one or more second offset current time segments to generate a current information output vector of product sale features, wherein the first current time segment and the one or more current time segments correspond to the current time slice, and wherein the first offset current time segment and the one or more second offset current time segments correspond to a first offset current slice which is offset from the current time slice by a predetermined training offset (Fan Par. 121-123-“ In certain embodiments, the optimal filter size is task dependent and not known a priori. In certain embodiments, the problem is solved by using a set of temporal convolutional filters g of different sizes (e.g., 1, 3, 5, 7, 11) to generate context features of different temporal scales, and then use the context selection layer to combine them to one multiscale compact feature vector... Similarly, convolutional filter g.sub.5 of size 1×5 is applied on the neighboring five hidden states. In certain embodiments, a g.sub.1 filter is added which considers no context beyond the current time step, for modeling abrupt changes such as peaks or troughs. On top of all hidden context features, a dynamic context selection layer is added, which learns the importance of each temporal scale and combines all of the context features together by weighted sum. This forms the final hidden context feature for prediction.”); 
and training the prediction model based on the current information input vector of product sale features, and on the current information output vector of product sale features. (Fan Par. 68-69- “The fully-connected layer 126 is configured to, upon receiving the concatenated long vector, learn interactions among different inputs in hidden space to form a compact feature vector or namely input feature vector for each product at one time step, and send the input feature vector to the encoder 140 or the decoder 160. In certain embodiments, the fully-connected layer 126 is a linear transformation in a form of Wx+b. W is a m-by-n matrix, in which m is the length of the long vector from the concatenating layer 124 (input of 126), and n is the hidden size of LSTM (output of 126). The encoder 140 is configured to, upon receiving the input feature vectors (compact feature vectors) for each product in historical time steps from the embedding module 120, perform LSTM learning to obtain hidden states for each time step. As shown in FIG. 2B, the encoder 140 includes an LSTM module 142 and a forecast transformer 144.; Par. 77-79- “The forecast transformer 168 is configured to, upon receiving the multi-scale context feature vectors from the context selection module 166, convert the multi-scale context feature vectors at different time steps to corresponding forecast outputs. The converting may be performed by a linear transformation, and the parameters of the linear transformation can be learned during training of the time series forecasting application 118.... In certain embodiments, a loss function is defined in the time series forecasting application 118 for training the application. The loss function considers the difference between the forecasted outputs (such as forecasted sales) and the actual outputs (actual sales).”)
Lee and Fan are directed to product sales predictive modelling. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon input data analysis of Lee, as taught by Fan, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Lee with the motivation of improving forecast accuracy. (Fan Par. 80).

Regarding Claim 4 and Claim 14 and Claim 19, Lei in view of Fan teach The method of claim 3, further comprising:... , The apparatus of claim 13, wherein the method further comprises:... and The non-transitory computer-readable medium of claim 18, wherein the method further comprises:...
Lei teaches 
and calculating a test prediction error based on a comparison of the test output of the prediction model with the current information test output vector of product sale features (Lei Par. 58-“Embodiments receive an early drop metric, such as a MAPE (Mean absolute percentage error) metric and a maximum iteration value (e.g., any positive number). In one embodiment, a 10% MAPE is used, and the maximum iteration value is 1000. At 208, it is determined if the early drop metric has been reached, and/or if the maximum iteration has been reached. When 208 is executed the first time, none of these metrics will have been reached and the functionality continues to 210.”); 
and updating the prediction model based on the test prediction error. (Lei Par.59-61-“ At 210, each of the features of the set of features are assigned into one of multiple regularization categories. The assignment can be random in one embodiment. In another embodiment, business logic can also be used, which can be implemented by machine learning/artificial intelligence, or a user can assign manually. For example, if it is known that the brand is a very important feature for a shirt, machine learning may choose to assign color to the “None” category so that this important feature will not be penalized at all....At 212, for each regularization category at 210, penalty parameters (i.e., weights) are set for each feature which apply. As disclosed above, in embodiments, penalty parameters for L1 are indicated as and penalty parameters for L2 are indicated as μ. The penalties can be inputs determined by the user, or can be automatically generated using machine learning/artificial intelligence. For instance, an important feature may get a lesser penalty than a less important feature.
Lei teaches prediction modelling and the feature is expounded upon by Fan:
running test iterations of the vector generation loop, based on a test current time slice which is incremented by a test phase increment after each iteration, wherein the vector generation loop is initiated with the test current time slice set to a test initial time slice which is offset from the predetermined training phase time slice by one first time unit and stopped when the test current time slice is set to the last time slice of the sequence of time slices, the method comprising, for each test iteration: (Fan Par. 65 & Related text –“ The embedding layer 122 is configured to, upon retrieving or receiving the categorical variables, embed the categorical variables into numerical feature vectors. In certain embodiments, the embedding layer 122 is a neural network. The embedding layer 122 may retrieve or receive the categorical variables directly from the database 190 or via the user interface 180. In certain embodiments, the categorical variables are time series training data or historical data, or future data. The data may be categorical data of products on an e-commerce platform.”; Par. 141-“ TRAINING AND EVALUATION: the details of model training and evaluation are described as follows. There is a total of 6000 time series in the GOC2018 online-sales dataset. Each time series is split into the training and testing part. The training part starts from the beginnings of time series (as early as January 2016) to December 2017, and the testing part covers the 31 days of January 2018. In addition, randomly sampled sets of consecutive days in total of one fifth of the series length per time series are held out as validation series. During training time, we randomly sample a batch of 32 different time series for each iteration. For each sampled time series, we randomly pick a training creation date, then take Th steps past of creation date as the history and Tf steps after as the future, to form the final training sequence.”): 
combining a third current time segment corresponding to the first time series with one or more fourth current time segments to generate a current information test input vector of product sale features, wherein the third current time segment and the one or more fourth current time segments correspond to the current time slice (Fan Par. 18-19-“ In certain embodiments, the plurality of temporal scales comprises 2-10 scales. In certain embodiments, the plurality of temporal scales comprises four scales of scale-1, scale-3, scale-5, and scale-7; for each target step from the future time steps: the scale-1 uses hidden state of the target step; the scale-3 uses hidden states of the target step, one of the future time steps immediately before the target step, and one of the time steps immediately after the target step; the scale-5 uses hidden states of the target step, two of the future time steps immediately before the target step, and two of the time step immediately after the target step; and the scale-7 uses hidden states of the target step, three of the future time steps immediately before the target step, and three of the time step immediately after the target step.”); 
combining a third offset current time segment corresponding to the first time series with one or more fourth offset current time segments to generate a current information test output vector of product sale features, wherein the third offset current time segment and the one or more fourth offset current time segments correspond to a second offset current slice which is offset from the test current time slice by a predetermined test offset (Fan Par. 121-123-“ In certain embodiments, the optimal filter size is task dependent and not known a priori. In certain embodiments, the problem is solved by using a set of temporal convolutional filters g of different sizes (e.g., 1, 3, 5, 7, 11) to generate context features of different temporal scales, and then use the context selection layer to combine them to one multiscale compact feature vector... Similarly, convolutional filter g.sub.5 of size 1×5 is applied on the neighboring five hidden states. In certain embodiments, a g.sub.1 filter is added which considers no context beyond the current time step, for modeling abrupt changes such as peaks or troughs. On top of all hidden context features, a dynamic context selection layer is added, which learns the importance of each temporal scale and combines all of the context features together by weighted sum. This forms the final hidden context feature for prediction.”); 
generating a test output of the prediction model by running the prediction model with the current information test input vector of product sale features as input (Fan Par. 141-“TRAINING AND EVALUATION: the details of model training and evaluation are described as follows. There is a total of 6000 time series in the GOC2018 online-sales dataset. Each time series is split into the training and testing part. The training part starts from the beginnings of time series (as early as January 2016) to December 2017, and the testing part covers the 31 days of January 2018. In addition, randomly sampled sets of consecutive days in total of one fifth of the series length per time series are held out as validation series. During training time, we randomly sample a batch of 32 different time series for each iteration. For each sampled time series, we randomly pick a training creation date, then take Th steps past of creation date as the history and Tf steps after as the future, to form the final training sequence. Validation and testing sequences have the same length as training sequences. We choose Th=Tf=31 for this month-long forecasting task. In real implementation, validation and testing sequences are held out in the data pre-processing step to guarantee no overlapping with training sequences.”); 
Lee and Fan are directed to product sales predictive modelling. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon input data analysis of Lee, as taught by Fan, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Lee with the motivation of improving forecast accuracy. (Fan Par. 80).
Regarding Claim 5 and Claim 15 and Claim 20, Lei in view of Fan teach The method of claim 1, wherein the information vector of product sale features comprises:... , The apparatus of claim 11, wherein the information vector of product sale features comprises:... and The non-transitory computer-readable medium of claim 16, wherein the information vector of product sale features comprises:...
a first data series of aggregate sales of the product at the point of sale per the first time unit over a first time period in the past starting on a current time and of the first predetermined duration; a second data series of aggregate sales of the product per a second time unit over a second time period in the past starting on the current time and of a second predetermined duration expressed in the second time unit; a first derivative data series of first derivatives of the first data series; a second derivative data series of second derivatives of the first data series; a third derivative data series of first derivatives of the second data series; a fourth derivative data series of second derivatives of the second data series; and a third data series of promotions for sale of the product at the point of sale per the first time unit over the first time period in the past starting on the current time.; (Lei Par. 52-55 “Historical sales and performance data may include, for example, data representing past sales and promotions of an item across a plurality of past retail sales periods. The historical performance data may be segmented into retail periods of past weeks, with each past week having numerical values assigned to it to indicate the number of items sold for that week. The historical performance data may also include numerical values representing price discounts and values of other promotion components across the retail periods, in accordance with one embodiment. The historical performance data for an item may be accessed via network communications, in accordance with one embodiment, including being accessed from each POS terminal 100 at each retail store and/or accessed from database 17. The historical performance data includes sales data associated with the plurality of promotion components across a plurality of time periods (e.g., weeks). Examples of promotion components include, but are not limited to, a price discount component, a television advertisement component, a radio advertisement component, a newspaper advertisement component, an email advertisement component, an internet advertisement component, and an in-store advertisement component. The historical data includes, for each item, a listing of feature/variables/attributes for the item, such as price, promotions, seasonality, brand, color, style, etc. The historical sales data is received as multiple data points or a “data set”, with a single data point for each sales of an item per store. For example, for 202, assume there are 100 k data points of item/store/week sales data for a given item/store(location). The functionality of FIG. 2 is used to forecast the demand for the item based on those received data points.”; Par. 10; Par. 18)
Regarding Claim 6,
The method of claim 5, wherein the information vector of product sale features further comprises: a fourth data series of forecast promotions for sale of the product at the point of sale per the first time unit over the first time period in the future from the current time (Lei Par. 41- “In addition to classification, SVMs have been successfully applied in sales or demand forecasting, being able to process common metrics, such as sales, as well as price, promotions, external factors such as weather and demographic information.”; Par. 10; Par. 18).
Regarding Claim 7,
The method of claim 5, wherein the information vector of product sale features further comprises: a fifth data series of weather data at the point of sale per the first time unit over the first time period in the past starting on the current time. (Lei Par. 41- “In addition to classification, SVMs have been successfully applied in sales or demand forecasting, being able to process common metrics, such as sales, as well as price, promotions, external factors such as weather and demographic information.”; Par. 10; Par. 18).
Regarding Claim 8,
The method of claim 5, wherein the information vector of product sale features further comprises: a sixth data series of weather data at the point of sale per the first time unit over the first time period in the future from the current time. (Lei Par. 41- “In addition to classification, SVMs have been successfully applied in sales or demand forecasting, being able to process common metrics, such as sales, as well as price, promotions, external factors such as weather and demographic information.”; Par. 10; Par. 18).
Regarding Claim 9,
The method of claim 5, wherein the information vector of product sale features further comprises: a seventh data series of weather data at the point of sale per the second time unit over a third time period in the future from the current time and of a third predetermined duration expressed in the second time unit. (Lei Par. 41- “In addition to classification, SVMs have been successfully applied in sales or demand forecasting, being able to process common metrics, such as sales, as well as price, promotions, external factors such as weather and demographic information.”; Par. 10; Par. 18).
Regarding Claim 10,
The method of claim 5, wherein the information vector of product sale features further comprises: a eighth data series of calendar data at the point of sale per the first time unit over the first time period in the past starting on the current time. (Lei Par. 18- “Further, for many machine learning algorithms (SVM, ANN, random forest, etc.), retailers will use a feature set (i.e., various attributes of an item) to define the data point at the product/location/calendar intersection. In these algorithms, retailers will train the model with the same feature set as it does for forecasting. Further, the same feature set could be used by several different algorithms for forecasting. A “feature set” is the collection of features that impact the demand or sales for an item as well as describe attributes of an item. Examples of features include base sales, price, seasonality, brand, promotions, size, color, pack size, supplier, length, etc. While features such as price and seasonality may be relevant for all types of products, some others are item specific. For example, pack size impacts the demand for yogurts, however the length of the pack is insignificant. Conversely, the brand is very important for fashion items, but is much less important for hardware items, such as nails or hammers.”; Par. 10).
Regarding Claim 11, 
Lei teaches
An apparatus, the apparatus comprising a processor and a memory operatively coupled to the processor, wherein the apparatus is configured to perform a method of managing product data comprising: generating a first time segment associated with a first product sale feature, the first time segment comprising product sales data related to sales of the product at a point of sale per a first time unit over a first time period in the past starting on a current time and of a first predetermined duration expressed in the first time unit, and one or more second time segments respectively associated with one or more second product sale features, the second time segments each comprising context data associated with product sale history or the point of sale over the first time period in the past; (Lei Par. 36-“ Embodiments are disclosed from the perspective that, for an item (i.e., a class of items such as yogurt or men's shirts) sold at a location (e.g., a retail location), the item may be promoted in various ways at various times (i.e., pre-defined retail periods, such as a day, week, month, year, etc.). A retail calendar has many retail periods (e.g., weeks) that are organized in a particular manner (e.g., four (4) thirteen (13) week quarters) over a typical calendar year. A retail period may occur in the past or in the future. Historical sales/performance data may include, for example, a number of units of an item sold in each of a plurality of past retail periods as well as associated promotion data (i.e., for each retail period, which promotions were in effect for that period) and any other relevant demand features/variables.”; Par. 28-29; Fig. 1); 
and generating a prediction of sales for the product at the point of sale based on the output of a prediction model to which the information vector of product sale features is input.; (Lei Par.40-“ In machine learning, SVMs are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.; Par. 50; Par. 63”)

Lei teaches predictive modeling and the feature is expounded upon by Fan:

combining the first time segment with the one or more second time segments to generate an information vector of product sale features  (Fan Par. 5;Par. 7-17;  Par. 18-19-“ In certain embodiments, the plurality of temporal scales comprises 2-10 scales. In certain embodiments, the plurality of temporal scales comprises four scales of scale-1, scale-3, scale-5, and scale-7; for each target step from the future time steps: the scale-1 uses hidden state of the target step; the scale-3 uses hidden states of the target step, one of the future time steps immediately before the target step, and one of the time steps immediately after the target step; the scale-5 uses hidden states of the target step, two of the future time steps immediately before the target step, and two of the time step immediately after the target step; and the scale-7 uses hidden states of the target step, three of the future time steps immediately before the target step, and three of the time step immediately after the target step.”);
Lee and Fan are directed to product sales predictive modelling. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon input data analysis of Lee, as taught by Fan, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Lee with the motivation of improving forecast accuracy. (Fan Par. 80).
Regarding Claim 16, 
Lei teaches
A non-transitory computer-readable medium encoded with executable instructions which, when executed, causes an apparatus comprising a processor operatively coupled with a memory, to perform a method of managing product data comprising: generating a first time segment associated with a first product sale feature, the first time segment comprising product sales data related to sales of the product at a point of sale per a first time unit over a first time period in the past starting on a current time and of a first predetermined duration expressed in the first time unit, and one or more second time segments respectively associated with one or more second product sale features, the second time segments each comprising context data associated with product sale history and/or the point of sale over the first time period in the past; (Lei Par. 36-“ Embodiments are disclosed from the perspective that, for an item (i.e., a class of items such as yogurt or men's shirts) sold at a location (e.g., a retail location), the item may be promoted in various ways at various times (i.e., pre-defined retail periods, such as a day, week, month, year, etc.). A retail calendar has many retail periods (e.g., weeks) that are organized in a particular manner (e.g., four (4) thirteen (13) week quarters) over a typical calendar year. A retail period may occur in the past or in the future. Historical sales/performance data may include, for example, a number of units of an item sold in each of a plurality of past retail periods as well as associated promotion data (i.e., for each retail period, which promotions were in effect for that period) and any other relevant demand features/variables.”; Par. 28-30; Fig. 1); 
and generating a prediction of sales for the product at the point of sale based on the output of a prediction model to which the information vector of product sale features is input.; (Lei Par.40-“ In machine learning, SVMs are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier. An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.; Par. 50; Par. 63”)

Lei teaches predictive modeling and the feature is expounded upon by Fan:

combining the first time segment with the one or more second time segments to generate an information vector of product sale features  (Fan Par. 5;Par. 7-17;  Par. 18-19-“ In certain embodiments, the plurality of temporal scales comprises 2-10 scales. In certain embodiments, the plurality of temporal scales comprises four scales of scale-1, scale-3, scale-5, and scale-7; for each target step from the future time steps: the scale-1 uses hidden state of the target step; the scale-3 uses hidden states of the target step, one of the future time steps immediately before the target step, and one of the time steps immediately after the target step; the scale-5 uses hidden states of the target step, two of the future time steps immediately before the target step, and two of the time step immediately after the target step; and the scale-7 uses hidden states of the target step, three of the future time steps immediately before the target step, and three of the time step immediately after the target step.”);
Lee and Fan are directed to product sales predictive modelling. It would have been obvious for one of ordinary skill in the art before the effective filing date of the claimed invention to have improved upon input data analysis of Lee, as taught by Fan, with a reasonable expectation of success of arriving at the claimed invention. One of ordinary skill in the art would have been motivated to make the modification to the teachings of Lee with the motivation of improving forecast accuracy. (Fan Par. 80).


Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure: US Patent Publication No. US 20210117995  A1 to Kansara et al. - Predicted transaction dates are predicted for the current level by a transaction date prediction model. The quantity forecasting model is used to generate predicted quantity information for the current level for the predicted transaction dates.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Chesiree Walton, whose telephone number is (571) 272-5219.  The examiner can normally be reached from Monday to Friday between 8 AM and 5 PM.  If any attempt to reach the examiner by telephone is unsuccessful, the examiner’s supervisor, Patricia Munson, can be reached at (571) 270-5396.  The fax telephone numbers for this group are either (571) 273-8300 or (703) 872-9326 (for official communications including After Final communications labeled “Box AF”).
	Another resource that is available to applicants is the Patent Application Information Retrieval (PAIR). Information regarding the status of an application can be obtained from the (PAIR) system. Status information for published applications may be obtained from either Private PAIR or Public PAX. Status information for unpublished applications is available through Private PAIR only. For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, please feel free to contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
	Applicants are invited to contact the Office to schedule an in-person interview to discuss and resolve the issues set forth in this Office Action.  Although an interview is not required, the Office believes that an interview can be of use to resolve any issues related to a patent application in an efficient and prompt manner.

Sincerely,
   
/CHESIREE A WALTON/Examiner, Art Unit 3624