DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Specification
The disclosure is objected to because of the following informalities:
In paragraph 0074, line 3, 5, and 9 said “13 parameters”. However, TABLE 13 consists of 12 parameters only. 
In paragraph 0069, line 12-14 said “The cluster data 1516, 1520, and 1520 are thus redacted in block 1522 …” The item 1520 is being repeated twice. Does it mean “1516, 1518 and 1520” instead? 
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 19, 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention. 

Regarding claim 19, the phrase “a redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters” is ambiguous a since it is unclear what constitute the meaning of “redacted” in the claim language, thus rendering the limitation indefinite. Applicant’s specification paragraph 69 and 70 mentioned a number of cluster data being redacted in block 1522. However, it does not provide any logical reason. The common meaning of the word “redacted” is being withdrawn or taken out of the document. Contrary to this ordinary meaning, paragraph 74 mentioned that FIG 6 and TABLE 2 comprise a set of redacted historical sensor data, namely, the 13 controllable parameters being used to create FIG. 6. The data appears to be set aside and being used for a different purpose, rather than being discarded. 
Claim 20 is rejected for its dependence on claim 19.
Regarding claim 20, the phrase “time entries” in the limitation “at least a predetermined number of time entries” is unclear and indefinite. Applicant’s specification does not use this terminology. Paragraph 43 uses the phrase “on a predetermined time grid (one data point per hour) based on LOWESS. Figure 5 item 504 also mention “Time Grid” for regression. However, this paragraph deals with the need for interpolating historical production data, rather than the data cluster.

Claim Rejections - 35 USC § 103
Claims (1, 3), (10, 12) are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2). Herzog (US Pub No 2017/0249559 A1), Danichev (US Pub No 2018/0357261 A1) and Camara (US Pat No 11,080,613 B1).
 

A system for predicting real-time production of a chemical product in a plant based on a subset among a set of parameters each monitored by one of a corresponding set of sensors at one of a corresponding set of measurement frequencies, the system comprising:
a memory ([Col 15 Line 60-65] magneto-optical memory storage means, in RAM or volatile memory, in ROM or flash memory);
a communication interface ([Claim 38] The system according to claim 25 wherein communication links between said processor and said production process, for allowing a remote control of said production process) ;
circuitry in communication with the memory and the communication interface ([Col 17 Line 35-43] in FIG. 8 by a suitable network topology, in a separate server connected to an … network … domain server… A server 14 where the method is implemented will typically also include functionality to exchange 40 data with the process control system and other servers 15 containing relevant data as e.g. process history data storage systems, laboratory data storage systems 3), the circuitry configured to:
acquire, via the communication interface, multiple series of timestamped historical sensor data, each series corresponding to one of the set of parameters taken by a corresponding sensor among the set of sensors ([Col 5 Line 64-66] A number of different sensors 51, 52, 53 are provided to measure important process output variables) 
at a corresponding measurement frequency of the set of measurement frequencies during a time period ([Col 12 Line 14-15] A general optimization problem can be expressed in a discrete formulation with a sampling time of ∆T; [Col 13 Line 25-∆T, mentioned above, may in a typical case be e.g. less than 15 minutes; [Col 15 Line 4-5] This is schematically illustrated in FIG. 6. A diagram illustrates a series of measurements 130 of a process output variable at different sampling times);
obtain a series of timestamped and indirectly measured historical production data ([Col 4 Line 1-5] State variables characterizing the result of the process control, but not directly used for controlling purposes, are in the present disclosure called associated process output variables. [Col 11 Line 19-22] In the middle part of FIG. 3, an ideal set-point trajectory s(t) for a controlled process output variable is illustrated. The controlled process output variable is measured, either directly or indirectly. [Col 2 Line 54-61] The objective function comprises relations involving predictions of controlled process output variables as a function of time for the prediction time period using the process model, based on the present and preferably also previous measurements of state variables);
develop a predictive model for production of the chemical product as a function of the selected subset of parameters and the corresponding selected series of sampled historical sensor data ([Col 10 Line 49-58] The optimization is performed minimizing an objective function. The objective function is formulated in accordance with the optimizing aspects and is preferably based on a comparison between the target trajectories of the controlled process output variables and controlled process output variables as predicted by the dynamic process model. The computation is based on present values of state variables. The objective function is minimized by varying the input trajectories for the manipulated variables. The input trajectories giving the minimum of the objective function is thereby stated to be the optimum input trajectories);

However, Persson does not teach
indirectly measured historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters;
sample the multiple series of timestamped historical sensor data of the set of parameters to obtain multiple corresponding sampled series of historical sensor data of the set of parameters having a common series of sampled timestamps;
interpolate the series of historical production data based on a local smoothing algorithm to obtain a series of modified production data having a series of timestamps corresponding to the common series of sampled timestamps;
 filter the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data;
apply at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data;
obtain real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;

Nevertheless, Herzog teaches
indirectly measured historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters (Note that Examiner interprets having frequency “smaller than” means the data are being recorded less frequently. [FIG. 1] item 110 “Historical data store”. [Para 0005] The parameter data may include the actual or current values from the signals or other calculated data whether or not based on the sensor signals. The parameter data is then processed by an empirical model to provide estimates of those values; [Para 0009] One example of an industry problems mentioned above concerns pump-assisted oil and gas extraction. Down hole sensors in wells and on electrical-submersible pumps provide continuous measurements of parameters such as reservoir temperature, reservoir pressure, and pump speed, but none of the key well performance parameters used to determine the volume of oil and gas extracted. Key performance parameters such as volumetric flow rate and watercut (i.e., the ratio of water produced compared to the volume of total liquids produced from an oil well) are measured at irregular intervals during well tests; Examiner regards Herzog’s field performance parameter as historical production data. [FIG. 3] item 308 “Create kernel regression models”. [Para 0034] In one example, several sensor values are obtained very frequently while other sensor values are obtained infrequently. In other words, for a current point in time some sensor values are definitely known, while others are not known. [Para 0035] It is desired by a user to obtain an estimate of the infrequent (unknown) sensor values from one or more sensors 
However, Herzog does not teaches
sample the multiple series of timestamped historical sensor data of the set of parameters to obtain multiple corresponding sampled series of historical sensor data of the set of parameters having a common series of sampled timestamps;
interpolate the series of historical production data based on a local smoothing algorithm to obtain a series of modified production data having a series of timestamps corresponding to the common series of sampled timestamps;
 filter the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data;
apply at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data;
obtain real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;
predict production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors.

sample the multiple series of timestamped historical sensor data of the set of parameters to obtain multiple corresponding sampled series of historical sensor data of the set of parameters having a common series of sampled timestamps ([Para 0059] Thus, time-aligned values may be generated within the portions of the time series datasets having overlapping time intervals ( e.g., the time-series datasets may have respective data values associated with a common time stamp or time stamps within a temporal tolerance of one another);
 interpolate the series of historical production data based on a local smoothing algorithm to obtain a series of modified production data having a series of timestamps corresponding to the common series of sampled timestamps ([Para 0058] At 302, the smoothness evaluator 118 may determine a degree of smoothness of the two time-series datasets of a given pairwise combination. The smoothness evaluator 118 may perform an analysis on each time-series dataset that generates a quantified representation of the degree of smoothness of each time-series dataset. In some examples, the smoothness evaluator 118 may compute an autocorrelation of each time-series dataset. Any processes previously described as implemented by the smoothness evaluator 118 may be implemented at 302; [Para 0067] For example, if the first time-series dataset is smooth and the second time-series dataset is noisy, data values of the first time-series dataset near a given time stamp of the second time-series dataset may be interpolated at the given time stamp to generate an interpolated data value that can be paired with the data value of the second time-series dataset associated with the same time stamp);
filter the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data ([Para 
apply at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data ([Para 0029] One example measure of correlation is the Pearson product-moment correlation coefficient, which is also referred to as the PPMCC, Pearson's r, or the PCC. The PCC is a normalized measure of the linear correlation between two sets of data. [Para 0033] a skilled administrator may tune the correlation threshold based on such factors and/or based on observed performance to achieve an automated deduplication process that sensibly distinguishes between correlated and non-correlated time-series datasets);
However, Danichev does not teach
obtain real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;
predict production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors.

obtain real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters ([Col 13 Line 65-67] The model can optionally be used online and in real-time for 65 continuous analysis of the process condition. [Col 3, Line 37-41, Line 59-62] "Massively Parallel Processing (MPP) Large Scale Combination of Time Series Data" employs a Massively Parallel Processing procedure … to monitor, substantially in real time, plants with many variables … A process may be subject to malfunctions and/or equipment failure, for example, due to unexpected and undesired disturbances, including sudden changes of feed conditions and surrounding environment, inappropriate manipulation of process variables and equipment, and aging of process machinery. [Col 5 Line 29-32] The process target can be a production variable (such as the total amount of oil produced in an oil platform) or a process index (such as the revenues obtained from the process operation); Examiner views Camara’s capturing of process/production variables as obtaining real-time readings during production); and
 predict production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors ([Col 5 Line 18-20] during a monitoring period 120, measured time sensor data 125 is compared to values predicted using the one or more prediction models. [Col 8, Line 53-55] each variable selected to predict the process target… the prediction of the process target is a linear combination of those coefficients and the selected variables; [Col 11 Line 3-10] In the first stage, each distributed working compute node works with a small subset of time series 510 in the respective group 520 …. to explain or predict 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and indirectly measure historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters, such as that of Herzog. One of ordinary skill would have been motivated to modify Persson, because consistently high quality of the historical sensor/production data are crucial in preparing a proper training data set for the machine learning algorithm.
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and sample and obtain common timestamped historical sensor data, interpolate and filter the historical production data and apply dimensionality reduction algorithm on the data, such as that of Danichev. One of ordinary skill would have been motivated to modify Persson, because filtering unwanted data and interpolating missing gaps of data would improve the quality of the data. Applying dimensionality reduction technique help reduce the complexity of the input data. 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and obtain real-time readings during production of the chemical product and predict production of the chemical product based on the predictive model and the real-time readings, such as that of Camara. One of ordinary skill would have been motivated to modify Persson, because forming predictive model has the advantage of projecting the outcome of production.

Regarding claim 3, Persson teaches


Regarding claim 10, Persson teaches
A method for predicting real-time production of a chemical product in a plant based on a subset among a set of parameters each monitored by one of a corresponding set of sensors at one of a corresponding set of measurement frequencies, the method comprising:
acquiring multiple series of timestamped historical sensor data, each series corresponding to one of the set of parameters taken by a corresponding sensor of the set of sensors ([Col 5 Line 64-66] A number of different sensors 51, 52, 53 are provided to measure important process output variables) 
at a corresponding measurement frequency of the set of measurement frequencies during a time period ([Col 12 Line 14-15] A general optimization problem can be expressed in a discrete formulation with a sampling time of ∆T; [Col 13 Line 25-26] The sampling time ∆T, mentioned above, may in a typical case be e.g. less than 15 minutes; [Col 15 Line 4-5] This is schematically illustrated in FIG. 6. A diagram illustrates a series of measurements 130 of a process output variable at different sampling times);

developing a predictive model for production of the chemical product as a function of the selected subset of parameters and the corresponding selected series of sampled historical sensor data ([Col 10 Line 49-58] The optimization is performed minimizing an objective function. The objective function is formulated in accordance with the optimizing aspects and is preferably based on a comparison between the target trajectories of the controlled process output variables and controlled process output variables as predicted by the dynamic process model. The computation is based on present values of state variables. The objective function is minimized by varying the input trajectories for the manipulated variables. The input trajectories giving the minimum of the objective function is thereby stated to be the optimum input trajectories);
However, Persson does not teach
indirectly measured historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters;

interpolating the series of historical production data based on a local smoothing algorithm to obtain a series of modified production data having a series of timestamps corresponding to the common series of sampled timestamps;
 filtering the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data;
applying at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data;
obtaining real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;
predicting production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors.
Nevertheless, Herzog teaches
indirectly measured historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters (Note that Examiner interprets having frequency “smaller than” means the data are being recorded less frequently. [FIG. 1] item 110 “Historical data store”. [Para 0005] The parameter data may include the actual or current values from the signals or other calculated data whether or not based on the sensor signals. The parameter data is then processed by an 
However, Herzog does not teaches

interpolating the series of historical production data based on a local smoothing algorithm to obtain a series of modified production data having a series of timestamps corresponding to the common series of sampled timestamps;
 filtering the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data;
applying at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data;
obtaining real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;
predicting production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors.
On the other hand, Danichev teaches
sampling the multiple series of timestamped historical sensor data of the set of parameters to obtain multiple corresponding sampled series of historical sensor data of the set of parameters having a common series of sampled timestamps ([Para 0059] Thus, time-aligned values may be generated within the portions of the time series datasets having overlapping time intervals ( e.g., the time-series datasets may have respective data values associated with a common time stamp or time stamps within a temporal tolerance of one another);
interpolated at the given time stamp to generate an interpolated data value that can be paired with the data value of the second time-series dataset associated with the same time stamp);
filtering the series of modified production data to reduce noise or abnormality in the series of modified production data and obtain a series of filtered production data ([Para 0026] The process selector 120 may select a temporal alignment process based on the degrees of smoothness determined by the smoothness evaluator 118. In some examples, the process selector 118 may compare the degrees of smoothness (e.g. autocorrelations) with a threshold to distinguish between datasets that are smooth and those that are noisy (i.e. not smooth). A time-series dataset that is smooth may have a degree of smoothness exceeding the threshold, and a time-series dataset that is noisy may have a degree of smoothness not exceeding the threshold... the process selector 120 
applying at least one dimensionality reduction algorithm on the multiple series of sampled historical sensor data using the series of filtered production data to select the subset of parameters from the set of parameters and corresponding selected series of sampled historical sensor data ([Para 0029] One example measure of correlation is the Pearson product-moment correlation coefficient, which is also referred to as the PPMCC, Pearson's r, or the PCC. The PCC is a normalized measure of the linear correlation between two sets of data. [Para 0033] a skilled administrator may tune the correlation threshold based on such factors and/or based on observed performance to achieve an automated deduplication process that sensibly distinguishes between correlated and non-correlated time-series datasets);
However, Danichev does not teach
obtaining real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters;
predicting production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors.
Nevertheless, Camara teaches
obtaining real-time readings during production of the chemical product from a subset of sensors corresponding to the subset of parameters ([Col 13 Line 65-67] The model can optionally be used online and in real-time for 65 continuous analysis of the process condition. [Col 3, Line 37-41, Line 59-62] "Massively Parallel Processing (MPP) Large Scale Combination of Time Series Data" employs a Massively Parallel Processing procedure … to monitor, substantially in real time, plants with many variables … A process variables and equipment, and aging of process machinery. [Col 5 Line 29-32] The process target can be a production variable (such as the total amount of oil produced in an oil platform) or a process index (such as the revenues obtained from the process operation); Examiner views Camara’s capturing of process/production variables as obtaining real-time readings during production); and
 predicting production of the chemical product based on the predictive model and the real-time readings of the subset of parameters from the subset of sensors ([Col 5 Line 18-20] during a monitoring period 120, measured time sensor data 125 is compared to values predicted using the one or more prediction models. [Col 8, Line 53-55] each variable selected to predict the process target… the prediction of the process target is a linear combination of those coefficients and the selected variables; [Col 11 Line 3-10] In the first stage, each distributed working compute node works with a small subset of time series 510 in the respective group 520 …. to explain or predict the target; Examiner views predicting the process target as a form of predicting the production of the chemical products).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and indirectly measure historical production data for the chemical product during the time period having an indirect measurement frequency smaller than the set of measurement frequencies corresponding to the set of parameters, such as that of Herzog. One of ordinary skill would have been motivated to modify Persson, because 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and sample and obtain common timestamped historical sensor data, interpolate and filter the historical production data and apply dimensionality reduction algorithm on the data, such as that of Danichev. One of ordinary skill would have been motivated to modify Persson, because filtering unwanted data and interpolating missing gaps of data would improve the quality of the data. Applying dimensionality reduction technique help reduce the complexity of the input data. 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and obtain real-time readings during production of the chemical product and predict production of the chemical product based on the predictive model and the real-time readings, such as that of Camara. One of ordinary skill would have been motivated to modify Persson, because forming predictive model has the advantage of projecting the outcome of production.

Regarding claim 12, Persson teaches
The system of claim 10, wherein filtering the series of modified production data to reduce noise or abnormality is based on Kalman filtering ([Col 15, Line 5-9, Line 18-20] A frame 131 defines a certain number of previous measurements, which are going to be used in the state estimation process. The previous measurements are put into the dynamic process model and minimized regarding the measurement noise and the model uncertainty… The actual state estimation can also be performed using other techniques, e.g. Kalman filter techniques).

Claims (2), (11) are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2). Herzog (US Pub No 2017/0249559 A1), Danichev (US Pub No 2018/0357261 A1) and Camara (US Pat No 11,080,613 B1), as applied to claims (1), (10) above, and further in view of Scolnicov (US Pat No 8,341,106 B1).

Regarding claim 2, Persson does not teach 
wherein the local smoothing algorithm is based on locally weighted scatterplot smoothing. 
Nevertheless, Scolnicov teaches
wherein the local smoothing algorithm is based on locally weighted scatterplot smoothing ([Para 0047] Data preparation engine 304 organizes and formats received data to be further processed … methods commonly known in the art may be applied to "smooth" the data collected from the network. Some of these methods are Locally Weighted Scatterplot Smoothing (LOWESS)).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and smooth the data based on locally weighted scatterplot smoothing, such as that of Scolnicov. One of ordinary skill would have been motivated to modify Persson, because smoothing the data with LOWESS is a robust way of filling in the data gap.

Regarding claim 11, Persson does not teach 
wherein the local smoothing algorithm is based on locally weighted scatterplot smoothing. 

wherein the local smoothing algorithm is based on locally weighted scatterplot smoothing ([Para 0047] Data preparation engine 304 organizes and formats received data to be further processed … methods commonly known in the art may be applied to "smooth" the data collected from the network. Some of these methods are Locally Weighted Scatterplot Smoothing (LOWESS)).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and smooth the data based on locally weighted scatterplot smoothing, such as that of Scolnicov. One of ordinary skill would have been motivated to modify Persson, because smoothing the data with LOWESS is a robust way of filling in the data gap.

Claims (4), (13) are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2). Herzog (US Pub No 2017/0249559 A1), Danichev (US Pub No 2018/0357261 A1) and Camara (US Pat No 11,080,613 B1), as applied to claims (1), (10) above, and further in view of Rojas (Raul Rojas, The Kalman Filter, Semantic Scholar, Published 2002).

Regarding claim 4, Persson does not teach
wherein the Kalman filtering is of a single dimension.
However, Rojas teaches
wherein the Kalman filtering is of a single dimension (Page 1] First, we consider the Kalman filter for a one-dimensional system. The main idea is that the Kalman filter is simply a linear weighted average of two sensor values… An example of a filter is the 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use the Kalman filtering of a single dimension, such as that of Rojas. One of ordinary skill would have been motivated to modify Persson, because when using Kalman filter, treating the system as having a one-dimensional state is an efficient way of performing computation.

Regarding claim 13, Persson does not teach
wherein the Kalman filtering is of a single dimension.
However, Rojas teaches
wherein the Kalman filtering is of a single dimension (Page 1] First, we consider the Kalman filter for a one-dimensional system. The main idea is that the Kalman filter is simply a linear weighted average of two sensor values… An example of a filter is the following: Assume that we have a system whose one-dimensional state we can measure at successive steps. [Page 3] 2. The one-dimensional Kalman Filter).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use the Kalman filtering of a single dimension, such as that of Rojas. One of ordinary skill would have been motivated to modify Persson, because when using Kalman filter, treating the system as having a one-dimensional state is an efficient way of performing computation.

Claims (5-8), (14-17) are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2). Herzog (US Pub No 2017/0249559 A1), Danichev , as applied to claims (1), (10) above, and further in view of Kollia (US Pub No 2017 /0372222 A1).

Regarding claim 5, Persson does not teach
wherein the at least one dimensionality reduction algorithm comprises a random forest algorithm (RFA).
	On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm comprises a random forest algorithm (RFA) ([Para 0013] The compute device 100 iteratively repeats this process, until the minority class is isolated in a node by itself. The compute device 100 may separately train a primary classification algorithm (such as a random forest) to classify all of the data samples of the training data. [Para 0022] The primary classification module 206 is configured to perform a primary classification on input data samples. In the illustrative embodiment, the algorithm used for the primary classification may be embodied as any classification algorithm other than one that is particularly directed to identifying members of a minority class, such as a random forest algorithm, support vector machines, etc. [Para 0027] The classification algorithm selected may be embodied as any classification algorithm, such as a support vector machine, an artificial neural network, a random forest, etc; [Para 0033] the compute device 100 may balance the classes in the child nodes based on the number of samples in each child node as well as anticipated or past classification error of such a distribution. For example, if in the second (or later) iteration of distributing the classes from the current node to child nodes, there are training data samples present in the current node that do not belong to any of the classes of the current node, those data error class and placed in their own child node; Examiner view Kollia teaches identifying child nodes in minority class using RFA, thus exposing the correlation and dependency between sampled data. Examiner further views that Kollia assigns certain child nodes to error class to reduce classification error. This is analogous to a dimensionality reduction process).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use the random forest algorithm (RFA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because RFA help classification of data by building nodes and producing a more accurate and stable prediction.

Regarding claim 6, Persson does not teach
wherein the at least one dimensionality reduction algorithm further comprises a principle component analysis (PCA).
On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm further comprises a principle component analysis (PCA) ([Para 0021] the feature extraction module 204 may transform the data samples into a reduced set of features. The feature extraction module 204 may use any algorithm for extracting features... Dimensionality reduction algorithms, such as principal component analysis, can also be incorporated in the feature extraction module 204. The feature extraction module 204 may be trained on training data. [Para 0031] In block 304, the compute device 100 extracts features from the training data and the test data. The compute device 100 may employ any feature extraction algorithm and/or dimensionality reduction algorithms, such as principal 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use principle component analysis (PCA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because PCA help reducing the dimensionality of data, increasing interpretability and minimizing information loss.

Regarding claim 7, Persson does not teach
wherein the at least one dimensionality reduction algorithm comprises a PCA.
On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm comprises a PCA ([Para 0021] the feature extraction module 204 may transform the data samples into a reduced set of features. The feature extraction module 204 may use any algorithm for extracting features... Dimensionality reduction algorithms, such as principal component analysis, can also be incorporated in the feature extraction module 204. The feature extraction module 204 may be trained on training data. [Para 0031] In block 304, the compute device 100 extracts features from the training data and the test data. The compute device 100 may employ any feature extraction algorithm and/or dimensionality reduction algorithms, such as principal component analysis; Examiner views Kollia uses PCA to extract correlations and features between sensor parameters).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use principle component analysis (PCA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because PCA help 

Regarding claim 8, Persson does not teach
wherein the RFA and the PCA are performed separately. 
On the other hand, Kollia teaches
wherein the RFA and the PCA are performed separately ([Para 0013] The compute device 100 iteratively repeats this process, until the minority class is isolated in a node by itself. The compute device 100 may separately train a primary classification algorithm (such as a random forest) to classify all of the data samples of the training data; [Para 0027] The minority classification training module 212 also includes a classification algorithm selection module 216, which is configured to select a classification algorithm to be used to classify the training data samples from one node into the corresponding child nodes.  It should be appreciated that a different classification algorithm may be chosen for each node. The classification algorithm selected may be embodied as any classification algorithm, such as a support vector machine, an artificial neural network, a random forest, etc.; Examiner views Kollia teaches the option of selecting algorithm such as RFA or PCA to classify data for each particular node) and
the subset of parameters are selected based on both the RFA and the PCA (See Para 0013, 0021, 0022, 0027, 0031 from above; Examiner views that Kollia teaches the ability to employ PCA to extract features from training data and test data, and then select a classification algorithm such as RFA to identify data node). 
, such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because using both algorithms separately would provide independent insights into what is missing in the original data set. Both tools have their own limitations and one may work slightly better than the other, depending the use case, as the two algorithm use different mathematical techniques to classify data and reduce the number of variables.

Regarding claim 14, Persson does not teach
wherein the at least one dimensionality reduction algorithm comprises a random forest algorithm (RFA).
	On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm comprises a random forest algorithm (RFA) ([Para 0013] The compute device 100 iteratively repeats this process, until the minority class is isolated in a node by itself. The compute device 100 may separately train a primary classification algorithm (such as a random forest) to classify all of the data samples of the training data. [Para 0022] The primary classification module 206 is configured to perform a primary classification on input data samples. In the illustrative embodiment, the algorithm used for the primary classification may be embodied as any classification algorithm other than one that is particularly directed to identifying members of a minority class, such as a random forest algorithm, support vector machines, etc. [Para 0027] The classification algorithm selected may be embodied as any classification algorithm, such as a support vector random forest, etc; [Para 0033] the compute device 100 may balance the classes in the child nodes based on the number of samples in each child node as well as anticipated or past classification error of such a distribution. For example, if in the second (or later) iteration of distributing the classes from the current node to child nodes, there are training data samples present in the current node that do not belong to any of the classes of the current node, those data samples may be grouped into an error class and placed in their own child node; Examiner view Kollia teaches identifying child nodes in minority class using RFA, thus exposing the correlation and dependency between sampled data. Examiner further views that Kollia assigns certain child nodes to error class to reduce classification error. This is analogous to a dimensionality reduction process).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use the random forest algorithm (RFA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because RFA help classification of data by building nodes and producing a more accurate and stable prediction.

Regarding claim 15, Persson does not teach
wherein the at least one dimensionality reduction algorithm further comprises a principle component analysis (PCA).
On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm further comprises a principle component analysis (PCA) ([Para 0021] the feature extraction module 204 may transform the data samples into a reduced set of features. The feature extraction principal component analysis, can also be incorporated in the feature extraction module 204. The feature extraction module 204 may be trained on training data. [Para 0031] In block 304, the compute device 100 extracts features from the training data and the test data. The compute device 100 may employ any feature extraction algorithm and/or dimensionality reduction algorithms, such as principal component analysis; Examiner views that Kollia uses PCA to extract correlations and features between sensor parameters).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use principle component analysis (PCA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because PCA help reducing the dimensionality of data, increasing interpretability and minimizing information loss.

Regarding claim 16, Persson does not teach
wherein the at least one dimensionality reduction algorithm comprises a PCA.
On the other hand, Kollia teaches
wherein the at least one dimensionality reduction algorithm comprises a PCA ([Para 0021] the feature extraction module 204 may transform the data samples into a reduced set of features. The feature extraction module 204 may use any algorithm for extracting features... Dimensionality reduction algorithms, such as principal component analysis, can also be incorporated in the feature extraction module 204. The feature extraction module 204 may be trained on training data. [Para 0031] In block 304, the compute device 100 extracts features from the training data and the test data. The 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and use principle component analysis (PCA), such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because PCA help reducing the dimensionality of data, increasing interpretability and minimizing information loss.

Regarding claim 17, Persson does not teach
wherein the RFA and the PCA are performed separately. 
On the other hand, Kollia teaches
wherein the RFA and the PCA are performed separately ([Para 0013] The compute device 100 iteratively repeats this process, until the minority class is isolated in a node by itself. The compute device 100 may separately train a primary classification algorithm (such as a random forest) to classify all of the data samples of the training data; [Para 0027] The minority classification training module 212 also includes a classification algorithm selection module 216, which is configured to select a classification algorithm to be used to classify the training data samples from one node into the corresponding child nodes.  It should be appreciated that a different classification algorithm may be chosen for each node. The classification algorithm selected may be embodied as any classification algorithm, such as a support vector machine, an artificial neural network, a random forest, etc.; Examiner views Kollia 
the subset of parameters are selected based on both the RFA and the PCA (See Para 0013, 0021, 0022, 0027, 0031 from above; Examiner views that Kollia teaches the ability to employ PCA to extract features from training data and test data, and then select a classification algorithm such as RFA to identify data node). 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and perform the RFA and the PCA separately while parameters are selected based on both, such as that of Kollia. One of ordinary skill would have been motivated to modify Persson, because using both algorithms separately would provide independent insights into what is missing in the original data set. Both tools have their own limitations and one may work slightly better than the other, depending the use case, as the two algorithm use different mathematical techniques to classify data and reduce the number of variables.

Claims (9), (18) are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2). Herzog (US Pub No 2017/0249559 A1), Danichev (US Pub No 2018/0357261 A1) and Camara (US Pat No 11,080,613 B1), as applied to claims (1), (10) above, and further in view of Young (US Pub No 2006/0218107 A1).

Regarding claim 9, Persson does not teach
wherein developing the predictive model of production of the chemical product is based on generalized linear regression.
On the other hand, Young teaches 

It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and develop the predictive model of production of the chemical product is based on generalized linear regression, such as that of Young. One of ordinary skill would have been motivated to modify Persson, because the generalized linear regression accounts for the residual noise variance and has the advantage of being less sensitive to noises and abnormalities in the input data. 

Regarding claim 18, Persson does not teach
wherein developing the predictive model of production of the chemical product is based on generalized linear regression.
On the other hand, Young teaches 
wherein developing the predictive model of production of the chemical product is based on generalized linear regression ([Para 0074] The objective of the genetic algorithm was … in linear regression analysis… was used as a fitness indicator statistic. 
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and develop the predictive model of production of the chemical product is based on generalized linear regression, such as that of Young. One of ordinary skill would have been motivated to modify Persson, because the generalized linear regression accounts for the residual noise variance and has the advantage of being less sensitive to noises and abnormalities in the input data. 

Claims 19, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Persson (US Pat No 7,085,615 B2), Levanoni (CA 2035672 A1), and Bauer (US Pub No 2016/0328654 A1), Gopalakrishnan (US Pub No 20160165472 A1), and Wang (US Pub No 2013/0066471 A1).

Regarding claim 19, Persson teaches
A method for controlling production of a chemical product in a plant by controlling a subset of controllable parameters among a set of parameters each monitored by one of a corresponding set of sensors, the method comprising:

each series corresponding to one of the set of parameters taken by a corresponding sensor of the set of sensors ([Col 5 Line 64-66] A number of different sensors 51, 52, 53 are provided to measure important process output variables);
obtaining a time series of historical production data for the chemical product corresponding to the multiple time series of historical sensor data for the set of parameters ([Col 15 Line 3-7] A diagram illustrates a series of measurements 130 of a process output variable at different sampling times. A frame 131 defines a certain number of previous measurements; Examiner views that Fig 2, 3 and 4 show the x-axis as time, thus indicating that Persson teaches the use of multiple sensors to sample various sets of data points over time);
However, Persson does not teach
determining at least two parameters among the set of parameters as clustering parameters;
clustering hierarchically the multiple series of historical sensor data and the corresponding production data according to the at least two clustering parameters to obtain a set of data clusters;
each data cluster corresponding to a range of values for the clustering parameters and comprising multiple sub-time series of historical sensor data for the set of 
For each data cluster of the set of data clusters:
extracting from the multiple sub-time series of historical sensor data for the set of parameters in the data cluster a redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters; and
determining, for the data cluster
for each of the subset of controllable parameters for optimizing production of the chemical product
by performing a simulated annealing algorithm having a input comprising the redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters and the sub-time series of historical production data for the chemical product;
monitoring real-time values of the clustering parameters
determining a real-time operating condition for the plant corresponding a cluster determined by the real-time values of the clustering parameters; and
controlling a set of adjustable control devices to adjust the subset of controllable parameters according to the global optimal values of the subset of controllable parameters for the real-time operating condition.

On the other hand, Levanoni teaches
determining … two parameters among the set of parameters as clustering parameters ([Page 11] Correlation analysis is performed to detect missing primary variables and insignificant model parameters. Process models are developed or updated, two-parameter process … is shown in FIG. 4. [FIG. 7] Classify Measured Variables. Clustering in Time, in Parameter Space; Examiner views Levanoni’s two-parameters as necessary to create a model clustering model space. It implicitly suggests having at least two parameters);
clustering hierarchically the multiple series of historical sensor data and the corresponding production data according to the at least two clustering parameters ([Page 10 - 11] Cluster analysis is used to identify the set of observations that reflect the current process behavior, i.e., are homogeneous in process performance. Process trends, cycles and forecast are computed using time-series analysis… [FIG. 7] CLASSIFY MEASURED VARIABLES. Clustering in Time, in Parameter Space), global optimal values (FIG. 8] 2. ESTIMATE OPTIMUM PROCESS CONDITIONS. Unconstrained Optimum - Global Optimization Quadratic Prog. - Constraint Optimization… 4. ARE OPTIMUM CONDITIONS SIGNIFICANT ? No - Go To Data Acquisition).
determining, for the data cluster, global optimal values ([Claim 5] optimization of the manufacturing process is on a local and a global basis) for each of the subset of controllable parameters for optimizing production of the chemical product ([Page 7] Process variables are classified into four basic types: Measurable, Controllable, Ideal and Fundamental. The Measurable variables are parameters obtained from direct measurements. Controllable variables are parameters which directly control the process…);
monitoring real-time values of the clustering parameters ([Page 6]  Process models are treated stochastically, with the values of model parameters being the updated items. [Page 9] The system monitors itself continuously. [Page 17]  The final on-line and real-time for monitoring, control and optimization).
However, Levanoni does not teach 
at least two parameters
to obtain a set of data clusters
each data cluster corresponding to a range of values for the clustering parameters and comprising multiple sub-time series of historical sensor data for the set of parameters and corresponding sub time-series of historical production data for the chemical product
For each data cluster of the set of data clusters
determining a real-time operating condition for the plant corresponding a cluster determined by the real-time values of the clustering parameters
extracting from the multiple sub-time series of historical sensor data for the set of parameters in the data cluster a redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters; and
by performing a simulated annealing algorithm having a input comprising the redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters and the sub-time series of historical production data for the chemical product;
controlling a set of adjustable control devices to adjust the subset of controllable parameters according to the global optimal values of the subset of controllable parameters for the real-time operating condition.

Nevertheless, Bauer teaches

each data cluster corresponding to a range of values for the clustering parameters and comprising multiple sub-time series of historical sensor data for the set of parameters and corresponding sub time-series of historical production data for the chemical product ([Para 0061] the data is continuous measurement-data collected from at least one sensor, and wherein the plurality of data-segments are feature-vectors extracted from plurality of sections of the data, and the computer readable medium (CRM) further configured for extracting the plurality of the feature-vectors from the plurality of sections; Examiner interpreted measurement data as time series of historical data. [Para 0101] The common way to deal with models for context aware data is to carefully design context partitioning.  To do that, knowledge about the observed system needs to be gained through domain expertise or by investigation of a significant volume of annotated measurement data, in order to identify which context parameters need to be considered and at what granularity; [Para 0106] A context partitioning module (200) divides the space of context parameters into several discrete subspaces and streams the data corresponding to each context partition into its own normality model instance (210-230); Examiner views context parameters of a partition/feature-cluster analogous to clustering parameters of a data cluster);
For each data cluster of the set of data clusters ([Para 0062] the CRM further configured to execute step of defining at least one additional feature-cluster associated to the data-segments of at least one of the initial subspaces, responsive to a failure of the 
extracting from the multiple sub-time series of historical sensor data for the set of parameters in the data cluster a redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters ([Para 0003] accordingly an extraction step is often used to remove noise and extract relevant features; [Para 0018] disclose the method as defined above, wherein the data is continuous measurement-data collected from at least one sensor; and wherein the plurality of data-segments are feature-vectors extracted from plurality of sections of the data. [Para 0020] the extracting is performed by a method selected from the group consisting of: principal component analysis (PCA); [Para 0102] The training data is then extracted from the database, at regular intervals; [TABLE 1] Context Extraction from timestamp; Examiner views the process of extraction as removing noise and identifying features. Examiner further interpreted “redacted” data as data being set aside); and 
determining a real-time operating condition for the plant corresponding a cluster determined by the real-time values of the clustering parameters ([FIG. 1] Real-time (based on Sensor data interface) vs. Offline (using Historical Measurement)); and

However, Bauer does not teach
at least two parameters;
by performing a simulated annealing algorithm having a input comprising the redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters and the sub-time series of historical production data for the chemical product;

Nevertheless, Gopalakrishnan teaches
by performing a simulated annealing algorithm having a input comprising the redacted set of multiple sub-time series of historical sensor data for the subset of controllable parameters and the sub-time series of historical production data for the chemical product ([Para 0074] FIGS. 3A-3E show example graphs of global historical data. [Para 0131] Control parameters for the cluster of cells may be adapted using an embodiment autonomous adaptive simulated annealing algorithm. [Para 0128] Embodiments of this disclosure may divide large networks into subgroups of smaller networks, and then optimize control decisions for the subgroups using a simulated annealing technique. Simulated annealing (SA) is a generic probabilistic meta-heuristic approach for solving global optimization problems that locate a good approximation to the global optimum of a given function in a large search space; Examiner views Gopalakrishnan teaches the application of SA in finding global optimal values for a given function or a set of data. [Para 0078] An example of the simulated annealing process that can be performed in the biased adjustment phase 209 is represented by the graph 500 in FIG. 5. The simulated annealing process may identify a local maximum 502 but may perform a chaotic jump (from Jump 1 to Jump 2) in order to locate a global maximum 504. Here, the maximums 502, 504 are determined maximums of the objective function described above. Examiner views Gopalakrishnan teaches using SA to locate a global maximum for a subset of data while watching out for local peaks. See FIG. 5). 

However, Gopalakrishnan does not teach 
at least two parameters;
controlling a set of adjustable control devices to adjust the subset of controllable parameters according to the global optimal values of the subset of controllable parameters for the real-time operating condition.

On the other hand, Wang teaches 
at least two parameters ([Para 0012] receiving data regarding at least two drilling operational parameters related to wellbore drilling operations; running a global search engine to optimize at least two controllable drilling parameter values and separately running a local search engine to optimize the at least two controllable drilling parameter values, each optimization based on at least one objective function. [Para 0062] FIGS. 4 and 5 illustrate an example of searching the optimal point with a local search engine… If the driller follows the recommendation, then the operating point, which is the cluster shown on the figures, moves towards the peak point; Examiner view Wang’s cluster method requires having at least two parameters).
controlling a set of adjustable control devices ([Para 0043] As one more specific example, data may be received regarding the drill bit rotation rate, an exemplary drilling parameter, either from the surface equipment or from downhole equipment, or from both surface and downhole equipment. The surface equipment may either provide a controlled rotation rate (setpoint, gain, etc.) as an input to the drilling equipment or a measured torque and RPM data, from which downhole bit rotary speed may be 
to adjust the subset of controllable parameters according to the global optimal values of the subset of controllable parameters for the real-time operating condition ([Para 0044] identify at least one controllable drilling parameter having significant correlation to an objective function, or one or more objective functions, incorporating two or more drilling performance measurements, such as ROP, MSE, vibration measurements… statistical model may be utilized in substantially real-time utilizing the received data. Exemplary local search engines may include gradient ascent search, PCA (principal component analysis)… The methods also include, at 204, a global search engine to construct the response surface of the selected objective function with respect to controllable drilling parameters in a 3-D surface or a hyperplane in N-dimensional space, by any regression or interpolation methods, and to find an optimal point from the response surface; Examiner views Wang teaches a method of collecting real-time data in finding optimal values of controllable parameters).

It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and determine clustering parameters, cluster sensor data hierarchically to determine data cluster with global optimal values and monitor real-time values of clustering parameters, such as that of Levanoni. One of ordinary skill would have been motivated to modify Persson, because Levanoni’s modeling of global values such as its graphical representation of a response surface map represents improvement and optimization of manufacturing processes.  
, such as that of Bauer. One of ordinary skill would have been motivated to modify Persson, because Bauer’s feature-cluster, feature-vectors, feature subspaces allows better context partition of data into an useful model while pinpointing data-segment anomaly.
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and control a set of adjustable control devices to adjust the subset of controllable parameters according to the global optimal values of the subset of controllable parameters for the real-time operating condition, such as that of Wang. One of ordinary skill would have been motivated to modify Persson, because adjusting controllable parameters would give a new degree of freedom in finding global optimal values.

Regarding claim 20, Persson does not teach 
wherein each data cluster of the set of data clusters comprises at least a predetermined number of time entries.
On the other hand, Bauer teaches
wherein each data cluster of the set of data clusters comprises at least a predetermined number of time entries ([Fig. 3] Item 320. Selecting at least two Initial-Subspaces, responsive to a predetermined similarity in the context-labels of the data-segments. [Para 0120] 1. Creating initial partitions by concatenation of predetermined similar context variables (e.g. day and minute: Monday_3_27) of the data-segments. [Para 0128] 3. Gathering the extracted feature-vectors into initial-subspaces, responsive context-labels (e.g. day and minute: Monday 3 _27); Examiner interprets time context labels or context variables referencing time as predetermined time grid).
It would have been obvious to one of ordinary skill in the art at the time of filing to modify Persson and set at least a predetermined number of time entries for each data cluster, such as that of Wang. One of ordinary skill would have been motivated to modify Persson, because the time context is crucial in determining the fit criterion of data for context-aware anomaly detection. 

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MAN O KUNG whose telephone number is 303-297-4338.  The examiner can normally be reached on Mon-Fri 9am-7pm (Pacific Time).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Alessandro Amari can be reached on 571-272-2306.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  
	
/MAN O KUNG/
Examiner, Art Unit 2863
/NATALIE HULS/Primary Examiner, Art Unit 2863