DETAILED ACTION
This action is in response arguments and amendments filed 3 February 2022 for application 16/548926 filed on 23 August 2019. Currently claims 1-24 are pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed 3 February 2022 have been fully considered but they are not persuasive.


Specifically, Applicants Argue:
Without conceding to the position of the Office, the independent claims 1, 8, and 15 have each been amended …Applicant respectfully submits that the cited portion of Li does not disclose or suggest the features of "provid[ing] a plurality of subsequence segments," "each subsequence segment corresponding to a phase of the target process, wherein the target process comprises a plurality of phases, and wherein each phase of the target process corresponds to a different time-series model," and "applying each feature data set of each subsequence segment to provide a plurality of time-series models," "each time-series model [] corresponding to a phase of the target process," as recited in the amended claims 1, 9, and 17.  Applicant further submits that the other cited references have not been shown to remedy the identified deficiencies of Li. 


Examiner’s Response:
The Examiner respectfully disagrees in part and notes that, during examination, a claim must be given its broadest reasonable interpretation consistent with the specification. M.P.E.P. 2173.01(1), M.P.E.P. 2111.01(11). As set forth in this action, Li partially teaches the amended limitations with Lee teaching those elements of the amended limitations not taught by Li. Specifically,  Li teaches “executing, … each subsequence segment corresponding to a phase of the target process, wherein the target process comprises a plurality of phases; … applying, by the one or more processors, each feature data set of each subsequence segment to provide a plurality of time-series models for anomaly detection and forecasting, respectively, each time-series model (i) corresponding to a phase …, and ii) being provided as one of a recurrent neural network (RNN), a convolution neural network (CNN), and a generative adversarial network (GAN)”  because he teaches an anomaly detection framework in which distinct RNN or GAN models are formed across distinct subsequences (windows) of the multivariate time series data in which each sliding window and associated step size corresponds to a phase (particular span of time/temporal extent associated with the industrial/target process) of the multi-variate time series which is also a temporal extent/phase associated with a monitored process where it is noted that the monitored system/target process under evaluation also is characterized by a sequence of processes (e.g., P1 through P5) (-viz., [pp. 4-5, section 3.2, p. 7, Section 4.1.1, Figure 1, Algorithm 1]To effectively learn from X, we apply a sliding window with window size sw and step size ss to divide the multivariate time series into a set of multivariate sub-sequences X = {xi , i = 1, 2, ..., m} ⊆ Rsw×T , where m = (M−sw) ss is the number of sub-sequences. Similarly, Z = {zi , i = 1, 2, ..., m} is a set of multivariate sub-sequences taken from a random space…. For detection, the testing dataset X test ⊆ RN×T is similarly divided into multivariate sub-sequences Xtes = x tes j , j = 1, 2, ..., n with a sliding window, where n = (N−sw) ss ., Generally, the attacked points include sensors (e.g., water level sensors, flow-rate meter, etc.) and actuators (e.g., valve, pump, etc.). … The water purification process in SWaT is composed of six sub-processes referred to as P1 through P6 [26]. The first process is for raw water supply and storage, and P2 is for pre-treatment where the water quality is assessed. Undesired materials are them removed by ultra-filtration (UF) backwash in P3. The remaining chorine is destroyed in the Dechlorination process (P4). Subsequently, the water from P4 is pumped into the Reverse Osmosis (RO) system (P5) to reduce inorganic impurities. Finally, P6 stores the water ready for distribution.) Although Li he teaches the generation of a sequence of models for the computation of an anomaly score for each testing sub-sequence and teaches that the target process itself consists of a sequence of subprocesses, he does not clearly disclose that the windowing is related to a corresponding phase/sub-process of the target process (e.g., that each of sub-processes P1 … P2 has a distinct model), and does not clearly indicate if the training of each model is based on the data associated with a single subprocess/phase of the target process. However, these missing elements are found in Lee. Specifically, Lee teaches “wherein the target process comprises a plurality of phases, and wherein each phase of the target process corresponds to a different time series model; applying, by the one or more processors, each feature data set of each subsequence segment to provide a plurality of time-series models for anomaly detection and forecasting, respectively, each time-series model (i) corresponding to a phase of the target process, and ii) being provided as one of a recurrent neural network (RNN)” because he teaches that, in a ML-based anomaly detection framework, multivariate time series data for a target process is used to construct models not only according to subsequence intervals in that time series but also based upon the state/phase of the target process itself in which a distinct best model (including RNN models) is learned (and subsequently identified/selected according to the RL protocol) so that a specific example of a set of successive target process phases is the sequence of levels of degradation (quantified primarily by MQE) that the target process exhibits as it progresses from normal behavior to complete failure with different models (e.g., Figure 4, Table 2) invoked at different phases/levels of degradation  but with, more generally, the different phases corresponding to different hit-points in the U-matrix (Figure 6) (-viz., [0052, 0056, 0062, Figure 4, Figure 6] The adaptive modeling aims to tackle the problem of selecting appropriate prediction models under different degradation statuses. The objective of the adaptive model selection is to obtain a mapping from each state to the probability of all possible prediction models that are taken into consideration in the modeling framework. The mapping provides a look-up table for model selection under different states., The action is defined as the choice of different prediction models. The prediction models include various data driven prediction algorithms. As one example, two types of prediction models (ARMA and RNN) are used. For each type of the prediction models, the structures and parameters are different. ARMA models can have different orders, such as ARMA (2, 1), ARMA (4, 3) and ARMA (12, 11) and so on, with different amounts of historical data used for training. RNN models can have various structures which are different in the number of input neurons, the number of hidden neurons, and the number of training samples. Each type of the two prediction models with different structures and parameters are considered as the available actions in the reinforcement learning framework., Within the framework defined above, the iterative process of reinforcement learning can be run for a certain predefined number of steps. The results will be a “look-up” table (see FIG. 4) in which the rows are different states and the columns are different prediction models., As the MQE increased, the extent of the degradation became more severe. Data from the first 500 cycles of the normal operation condition were used to train the SOM. …In the first 1450 cycles, the bearing was in good condition, and the MQEs were near Zero. From cycle 1450 to cycle 1650, the initial defects appeared and the MQE started increasing. The MQE continued increasing until approximately cycle 1750, this was an indication that the defects had become more serious. Subsequently, until around cycle 2050, the MQE dropped, this was due to the propagation of the roller defect becoming counterbalanced by the vibration. Shortly thereafter the MQE increased sharply until the bearing failed.)

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-9, 11-17, and 19-24 are rejected under 35 U.S.C. 103 as being unpatentable over Li et al. (“MAD-GAN: Multivariate Anomaly Detection for Time Series Data with Generative Adversarial Networks”, https:// https://arxiv.org/abs/1901.04997, arXiv:1901.04997v1 [cs.LG], 15 Jan 2019, pp. 1-17), hereinafter referred to as Li, in view of Lee et al. (US2010/0023307, 28 Jan. 2010), hereinafter referred to as Lee.

In regards to claim 1, Li teaches A computer-implemented method for automated machine learning in industrial process control, the method being executed by one or more processors and comprising: receiving, by the one or more processors, two or more time-series data sequences representative of a target process executed within a physical environment; ([p. 5, section 3.2, p. 7, Section 4.1.1, Figure 1, Algorithm 1] Let us now formulate the anomaly detection problem using GAN. Given a training dataset X ⊆ RM×T with T streams and M measurements for each stream, and a testing dataset X test ⊆ RN×T with T streams and N measurements for each stream, the task is to assign binary (0 for normal and 1 for anomalous) labels to the measurements of testing dataset., The water purification process in SWaT is composed of six sub-processes referred to as P1 through P6 [26]. The first process is for raw water supply and storage, and P2 is for pre-treatment where the water quality is assessed., wherein a computer-based architecture (Figure 1) implements an automated anomaly detection machine learning algorithms (e.g., algorithm 1) which processes a multi-variate time series (T distinct time series streams) that includes parameters and variables corresponding to an industrial process (e.g., water filtration) subject to control (e.g., identification of anomalies in that process and their rectification).)  executing, by the one or more processors, automated time-series process segmentation to provide a plurality of subsequence segments for each of the two or more time-series data sequences, each subsequence segment corresponding to a phase of the target process, wherein the target process comprises a plurality of phases, …; ([pp. 4-5, section 3.2, p. 7, Section 4.1.1, Figure 1]To effectively learn from X, we apply a sliding window with window size sw and step size ss to divide the multivariate time series into a set of multivariate sub-sequences X = {xi , i = 1, 2, ..., m} ⊆ Rsw×T , where m = (M−sw) ss is the number of sub-sequences. Similarly, Z = {zi , i = 1, 2, ..., m} is a set of multivariate sub-sequences taken from a random space…. For detection, the testing dataset X test ⊆ RN×T is similarly divided into multivariate sub-sequences Xtes = x tes j , j = 1, 2, ..., n with a sliding window, where n = (N−sw) ss ., Generally, the attacked points include sensors (e.g., water level sensors, flow-rate meter, etc.) and actuators (e.g., valve, pump, etc.). … The water purification process in SWaT is composed of six sub-processes referred to as P1 through P6 [26]. The first process is for raw water supply and storage, and P2 is for pre-treatment where the water quality is assessed. Undesired materials are them removed by ultra-filtration (UF) backwash in P3. The remaining chorine is destroyed in the Dechlorination process (P4). Subsequently, the water from P4 is pumped into the Reverse Osmosis (RO) system (P5) to reduce inorganic impurities. Finally, P6 stores the water ready for distribution., wherein an automated time-series segmentation process in the anomaly detection framework forms subsequences using a window of a given size/resolution and a step size to characterize the coarseness whereby the window slides over each time series with this sub-sequence generation performed for each to the time series corresponding to a window with each sliding window and associated step size corresponding to a phase (particular span of time/temporal extent associated with the industrial/target process) of the multi-variate time series (i.e., the subsequence segment attributes such as subsequence number in a sequence of subsequences and subsequence span and step are being interpreted to be corresponding to a phase of that process with the optimal subsequence segment attributes for that process optimized by considering multiple different subsequence segment attributes in a parametric optimization process, and wherein the monitored system/target process under evaluation also is characterized by a sequence of processes (e.g., P1 through P5)). processing, by the one or more processors, the plurality of subsequence segments using at least one time-series transformation to provide a feature data set for each subsequence segment; ([p. 10, Section 5.3, pp. 12-13, Section 5.3.3], As described earlier, the sub-sequences are fed into the MAD-GAN model. Note that to reduce the computational load, we reduce the original dimension by PCA, choosing the PC dimension in based on the PC variance rate., Dimension Reduction. As mentioned, to minimize the computation load of the LSTM-RNNs, we used PCA to project the raw data into a lower dimensional principal space instead of directly feeding the high dimensional data to the MAD-GAN model…. To show that the anomaly detection performance was not influenced by removing the less important variables, in Table 3, we list the anomaly detection evaluation metrics of MAD-GAN at different PC resolution (from PC=1 to PC=10) as well as all the original variable space for the SWaT dataset with sub-sequence length sw = 30….As such, we projected the SWaT (WADI) data to the first 5 (8), and then applied the MAD-GAN to detect anomalies for the projected data., wherein, before being sent into the ML anomaly detection models (MAD-GAN – Figure 1), a PCA transform is applied to that data such that this is interpreted as occurring after the subsequence determination since it is the step performed (just) before entry of that transformed data into the GAN model with the subsequence generation clearly taking place also before this entry as shown in Figure 1 and with the end result of the segmentation and PCA application being the formation of a set of subsequences having the reduced dimension.) applying, by the one or more processors, each feature data set of each subsequence segment to provide a plurality of time-series models for anomaly detection and forecasting, respectively, each time-series model (i) corresponding to a phase …, and ii) being provided as one of a recurrent neural network (RNN), a convolution neural network (CNN), and a generative adversarial network (GAN);  ([p. 3, Section 3.1, Figure 1, Algorithm 1], First, to handle the time-series data, we construct the GAN’s generator and discriminator as two Long-Short-Term Recurrent Neural Networks (LSTM-RNN), as shown in the left middle part of Fig. 1. Following a typical GAN framework, the generator (G) generates fake time series with sequences from a random latent space as its inputs, and passes the generated sequence samples to the discriminator (D), which will try to distinguish the generated (i.e. “fake”) data sequences from the actual (i.e. “real”) normal training data sequences., wherein the models used to perform the anomaly detection/prediction include a GAN which also encompasses an RNN (LSTM) as shown in Figure 1 (it is noted that the claim limitation only requires one of the neural network models in the list) in which the plurality of models are formed across different subsequences (windowed) of the multivariate time series (a separate anomaly detection score is determined for each phase/window of the multivariate time series).) determining, by the one or more processors, anomaly scores based on the plurality of time-series models; …([p. 6, Section 3.3, Figure 1, Algorithm 1], Based on the above descriptions, the GAN-trained discriminator and generator will output a set of anomaly detection losses {L = Lj,s, j = 1, 2, ..., n; s = 1, 2, ..., sw} ⊆ Rn×sw for each test data sub-sequence. We compute a combined discrimination-cum-reconstruction anomaly score called the DR-Score (DRS) by mapping the anomaly detection loss of sub-sequences back to the original time series:…, wherein the MAD-GAN architecture framework generates anomaly scores based on the GAN and LSTM-RNN models (both the generator and discriminator components) by processing the multi-variate time series of features.)
However, Li does not explicitly teach … wherein each phase of the target process corresponds to a different time series model;… (i) corresponding to a phase of the target process,…and selectively providing, by the one or more processors, an alert to one or more users, each alert indicating at least one anomaly and a respective probability. Although Li teaches the generation of indications of anomalies, he does not indicate that those indications and an associated probability is sent to users. In addition, Li does not disclose that the windowing is related to a corresponding phase of the target process and does not clearly indicate if the training of each model is based on the data associated with a single subprocess/phase of the target process even though he teaches the computation of an anomaly score for each testing sub-sequence and the target process itself consists of a sequence of subprocesses.
However, Lee, in the analogous environment of designing ML-based multi-variate time series anomaly detection frameworks, teaches wherein the target process comprises a plurality of phases, and wherein each phase of the target process corresponds to a different time series model; applying, by the one or more processors, each feature data set of each subsequence segment to provide a plurality of time-series models for anomaly detection and forecasting, respectively, each time-series model (i) corresponding to a phase of the target process, and ii) being provided as one of a recurrent neural network (RNN), …;  ([0052, 0056, 0062, Figure 4, Figure 6] The adaptive modeling aims to tackle the problem of selecting appropriate prediction models under different degradation statuses. The objective of the adaptive model selection is to obtain a mapping from each state to the probability of all possible prediction models that are taken into consideration in the modeling framework. The mapping provides a look-up table for model selection under different states., The action is defined as the choice of different prediction models. The prediction models include various data driven prediction algorithms. As one example, two types of prediction models (ARMA and RNN) are used. For each type of the prediction models, the structures and parameters are different. ARMA models can have different orders, such as ARMA (2, 1), ARMA (4, 3) and ARMA (12, 11) and so on, with different amounts of historical data used for training. RNN models can have various structures which are different in the number of input neurons, the number of hidden neurons, and the number of training samples. Each type of the two prediction models with different structures and parameters are considered as the available actions in the reinforcement learning framework., Within the framework defined above, the iterative process of reinforcement learning can be run for a certain predefined number of steps. The results will be a “look-up” table (see FIG. 4) in which the rows are different states and the columns are different prediction models., As the MQE increased, the extent of the degradation became more severe. Data from the first 500 cycles of the normal operation condition were used to train the SOM. …In the first 1450 cycles, the bearing was in good condition, and the MQEs were near Zero. From cycle 1450 to cycle 1650, the initial defects appeared and the MQE started increasing. The MQE continued increasing until approximately cycle 1750, this was an indication that the defects had become more serious. Subsequently, until around cycle 2050, the MQE dropped, this was due to the propagation of the roller defect becoming counterbalanced by the vibration. Shortly thereafter the MQE increased sharply until the bearing failed., wherein, in a ML-based anomaly detection framework, multivariate time series data for a target process is used to construct models not only according to subsequence intervals in that time series but also based upon the state/phase of the target process itself in which a distinct best model (including RNN models) is learned (and subsequently identified/selected according to the RL protocol) and wherein a specific example of a set of successive target process phases is the sequence of levels of degradation (quantified primarily by MQE) that the target process exhibits as it progresses from normal behavior to complete failure with different models (e.g., Figure 4, Table 2) invoked at different phases/levels of degradation  but with, more generally, the different phases corresponding to different hit-points in the U-matrix (Figure 6).)  and selectively providing, by the one or more processors, an alert to one or more users, each alert indicating at least one anomaly and a respective probability. ([0006, 0078, 0101, 0105, 0110] Many system components can undergo a long degradation process before catastrophic failures occur. If a certain operating condition is continuously examined, the degradation status of the component will change over time. Performance indices (e.g., “1” meaning normal, and “O'” meaning unacceptable) may be stable in the range of 0.9 to 1.0 at the beginning., After the distributions of both the normal baseline and the predicted feature space are approximated through the use of a boosting GMM, the confidence value (CV), which indicates the performance of the machine (1 for normal, 0 for abnormal), is calculated… If the two distributions overlap extensively, the confidence value will be near 1, which means the performance of the machine does not deviate from the baseline significantly. Otherwise, if the two distributions rarely overlap, the confidence value will be near 0, which means the performance of the machine deviates from the baseline significantly and the machine is probably acting abnormally., Roller bearing failure modes generally include roller failure, inner-race failure, outer-race failure, and a combination of these failures. The presence of different failure modes may cause different patterns of contact forces as the bearing rotates, which cause sinusoidal vibrations. If the confidence values predicted drop to a very low level, a very interesting task is trying to determine what kind of failure the bearing has developed. The SOM method described herein was employed for diagnosis for bearings., After training the SOM, a health map was obtained, which showed eight areas indicating the normal status, roller defect, inner-race defect, outer-race defect, outer-race & roller defect, outer-race & inner-race defect, inner-race & roller-defect and outer-race & inner-race & roller defect, respectively. With new data coming in, their extracted features were fed into the trained SOM, and their “hit points' on the health map represented the failure mode of the bearing., When implemented on a microprocessor based system, a microprocessor executes the above-men tioned processes (e.g., extracting features, decomposing data, selecting a prediction model, generating a predicted feature space, generating a confidence value, providing a status of mechanical system based at least in part on the generated data, etc.), interfacing with memory (e.g., local and/or remote via wired and/or wireless communications) such as for retrieving and storing the processes, results, and data (e.g., measurement data, mechanical system data, prediction models, reinforcement learning model, etc.), interfacing with a display for providing status, selection choices, data, and results, and interfacing with user interface(s) for receiving input (e.g., selection, navigation, etc.)., wherein, a ML-based anomaly detection framework determines the degradation status of components in that system (through the analysis of time series data) and sends that information to a display for communicating that information to users such that this information includes indications (alerts) indicative of particular system components at fault such as confidence value and status (both of which indicate a likelihood/probability of failure according to a number between 0 and 1 and which are also derived using statistical considerations) as well as an indication of process-specific anomalies in the health map (formed by the SOM to in a diagnosis function to represent the status/confidence value information – Figure 1).) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to incorporate the teachings of Lee for each phase of the target process to correspond to a different time series model, to apply each feature data set of each subsequence segment to provide a plurality of time-series models for anomaly detection and forecasting in which each time-series model corresponds to a phase of the target process and is being provided as an RNN, and to selectively provide an alert users in which each alert indicates at least one anomaly and a respective probability.  The modification would have been obvious because one of ordinary skill would have been motivated to mitigate failure costs in a facility by using an ML-based adaptive anomaly detection framework by adaptively selecting the best performing prognostic model from a set of models according to the state of the system of the facility in which the adaptive prognostic models identify and communicate to users a confidence-based level of degradation along with diagnostic information that specifies that nature/source of the anomaly (Lee, [0006, 0007, 0017]).

In regards to claim 3, the rejection of claim 1 is incorporated and Li further teaches wherein the at least one time-series transformation comprises one or more of a temporal transformation,  a shape transformation, a statistical transformation, an autoencoder transformation, and a decomposition transformation, and a pass-through transformation.  ([p. 10, Section 5.3, pp. 12-13, Section 5.3.3], As described earlier, the sub-sequences are fed into the MAD-GAN model. Note that to reduce the computational load, we reduce the original dimension by PCA, choosing the PC dimension in based on the PC variance rate., Dimension Reduction. As mentioned, to minimize the computation load of the LSTM-RNNs, we used PCA to project the raw data into a lower dimensional principal space instead of directly feeding the high dimensional data to the MAD-GAN model…. To show that the anomaly detection performance was not influenced by removing the less important variables, in Table 3, we list the anomaly detection evaluation metrics of MAD-GAN at different PC resolution (from PC=1 to PC=10) as well as all the original variable space for the SWaT dataset with sub-sequence length sw = 30….As such, we projected the SWaT (WADI) data to the first 5 (8), and then applied the MAD-GAN to detect anomalies for the projected data., wherein the PCA transformation applied to the time series (subsequence) data before being sent into the ML anomaly detection models, is a decomposition (principal component) transformation, wherein it is noted that the PCA is well known to be a statistical transformation by virtue of being based on the covariance of a dataset and is also a shape transformation, in a BRI sense, by virtue of reducing the dimension of the dataset, and wherein it is noted that the claim only requires the application of only one type of transformation  to be performed.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to incorporate the teachings of Lee for the same reasons as pointed out for claim 1.

In regards to claim 4, the rejection of claim 1 is incorporated and Li does not further teach, further comprising processing each feature data set through a reinforcement learning framework, in which rewards are applied based on minimal time deviation from a process set point within the target process.  Li does not use reinforcement learning in his ML framework.
However, Lee, in the analogous environment of designing ML-based multi-variate time series anomaly detection frameworks, teaches  further comprising processing each feature data set through a reinforcement learning framework, in which rewards are applied based on minimal time deviation from a process set point within the target process.  ([Figure 1, Figure 4, 0032,  0052, 0057, 0060,  0092]   SOM provides a performance index to evaluate the degradation condition when only normal measurement is available. For each input feature vector, a BMU can be found in the SOM trained only with the measurement in the normal operating state. The minimum quantization error (MQE) is defined as the distance between the input feature vector and the weight vector of the BMU. The MQE actually indicates how far away the input feature vector deviates from the normal operating state…. Hence, the degradation trend can be measured by the trend of the MQE., As the iteration process proceeds, the reinforcement learning algorithm learns through the interaction with the environment to maximize the reward in a long run., The different states are defined by different degradation statuses identified by SOM as described herein. The MQE, described herein, is used as the indicator of the degradation status. The mean value and standard deviation of the MQE are used to define different states for the reinforcement learning framework., The reward is based on prediction accuracy. A prediction model, which has high prediction accuracy, will be assigned a high/positive reward; otherwise, a low/negative reward will be given. Mean squared error (MSE), mean absolute deviation (MAD), and mean absolute percentage error (MAPE) can be used as the reward function., From cycle 1450 to cycle 1650, the initial defects appeared and the MQE started increasing. The MQE continued increasing until approximately cycle 1750, this was an indication that the defects had become more serious. Subsequently, until around cycle 2050, the MQE dropped, …Shortly thereafter the MQE increased sharply until the bearing failed. It was verified that during the MQE increase that started after cycle 1500, the amount of debris that adhered to the magnetic plug increased. The debris was allowed to continue to increase until it accumulated to a certain level, which caused an electrical switch to stop the running of the test., wherein each feature is processed in the RL framework to determine the (model-dependent) status degradation/MQE which is an indication of the deviation of the system from a normal state (process set point within the target process) such that the system learns how to optimize the accuracy of degradation prediction through reinforcement learning by using the computed status degradation/MQE (and associated statistics) and its progression over time to determine the rewards to be applied to each MQE determination (e.g., MSE of MQE or of the mapping of model-dependent status degradation measure with uncertainty to MQE) such that the RL learning process is designed to maximize the long term reward (future and immediate rewards) which corresponds to maximizing the accuracy of the estimate of the deviation of the system from baseline at both any given moment but also over time in order to reduce the amount of time before a complete component failure is recognized (minimum time deviation) by enabling an early prediction/identification of the failure/anomaly through the temporal trends in the MQE (in other words, the rewards are based on minimizing the time before the discernment of an anomaly according to the deviation of the features in the data set from normal operating/baseline conditions).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to incorporate the teachings of Lee to process each feature data set through a reinforcement learning framework, in which rewards are applied based on minimal time deviation from a process set point within the target process.  The modification would have been obvious because one of ordinary skill would have been motivated to mitigate failure costs in a facility by using an ML-based adaptive anomaly detection framework with ML prognostic models in which reinforcement learning is used to improve the prognostic robustness of those models by learning an optimal policy that maximizes the early discernment of degradation/deviation through the maximization of rewards in the long run (Lee, [0006, 0007, 0017, 0051]).

In regards to claim 5, the rejection of claim 4 is incorporated and Li does not further teach, further comprising automatically adjusting a controller based on reinforcement learning to reduce deviation from the process set point.  Li does not use reinforcement learning in his ML framework.
However, Lee, in the analogous environment of designing ML-based multi-variate time series anomaly detection frameworks, teaches further comprising automatically adjusting a controller based on reinforcement learning to reduce deviation from the process set point.  ([Figure 1, Figure 4, 0032, 0052, 0057, 0060, 0061, 0092]   SOM provides a performance index to evaluate the degradation condition when only normal measurement is available. For each input feature vector, a BMU can be found in the SOM trained only with the measurement in the normal operating state. The minimum quantization error (MQE) is defined as the distance between the input feature vector and the weight vector of the BMU. The MQE actually indicates how far away the input feature vector deviates from the normal operating state…. Hence, the degradation trend can be measured by the trend of the MQE., As the iteration process proceeds, the reinforcement learning algorithm learns through the interaction with the environment to maximize the reward in a long run., The different states are defined by different degradation statuses identified by SOM as described herein. The MQE, described herein, is used as the indicator of the degradation status. The mean value and standard deviation of the MQE are used to define different states for the reinforcement learning framework., The reward is based on prediction accuracy. A prediction model, which has high prediction accuracy, will be assigned a high/positive reward; otherwise, a low/negative reward will be given. Mean squared error (MSE), mean absolute deviation (MAD), and mean absolute percentage error (MAPE) can be used as the reward function., The policy, which defines the behavior of an agent, is the probability of choosing different prediction models in different states. The policy can also be seen as a mapping from the perceived environmental state to the actions to be taken. The optimal policy will be learned during the reinforcement learning. From cycle 1450 to cycle 1650, the initial defects appeared and the MQE started increasing. The MQE continued increasing until approximately cycle 1750, this was an indication that the,  defects had become more serious. Subsequently, until around cycle 2050, the MQE dropped, …Shortly thereafter the MQE increased sharply until the bearing failed. It was verified that during the MQE increase that started after cycle 1500, the amount of debris that adhered to the magnetic plug increased. The debris was allowed to continue to increase until it accumulated to a certain level, which caused an electrical switch to stop the running of the test., wherein the agent in the RL framework is a controller that selects an action in the form of selecting (based on the features of the data set) the most effective (accurate) model for computing the (model-dependent) MQE which is an indication of the deviation of the system from a normal state (process set point within the target process), wherein the controller learns how to optimize the accuracy of MQE over time and over diverse scenarios through the RL paradigm such that the agent’s/controller’s action selection optimizes the accuracy of the estimate of the deviation of the system at both any given moment but also over time in order to reduce the amount of time before a complete component failure occurs (minimum time deviation) by enabling an early prediction/identification of the failure/anomaly (relative to baseline or normal operation) through the temporal trends in the MQE.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to incorporate the teachings of Lee to automatically adjust a controller based on reinforcement learning to reduce deviation from the process set point.  The modification would have been obvious because one of ordinary skill would have been motivated to mitigate failure costs in a facility by using an ML-based adaptive anomaly detection framework with ML prognostic models in which reinforcement learning is used, through RL agent/controller action selection, to improve the prognostic robustness of those models by learning an optimal policy that maximizes the early discernment of degradation/deviation through the maximization of rewards in the long run  (Lee, [0006, 0007, 0017, 0051]).

In regards to claim 6, the rejection of claim 1 is incorporated and Li does not further teaches comprising executing root cause analysis to determine a cause of an anomaly using one or more of classification, correlation, and RL-based action tracking for fault cause inference.   Although Li teaches the generation of indications of anomalies, he does not disclose a root cause analysis for those anomalies.  
However, Lee, in the analogous environment of designing ML-based multi-variate time series anomaly detection frameworks, teaches  comprising executing root cause analysis to determine a cause of an anomaly using one or more of classification, correlation, and RL-based action tracking for fault cause inference. ([0030, 0032, 0085, 0087, 0101, 0105] The inner product can be used as an analytical measure for the match of X with (). The Euclidean distance may be a better and more convenient measure criterion for the match of X with (0. The minimum distance defines the BMU., For each input feature vector, a BMU can be found in the SOM trained only with the measurement in the normal operating state. The minimum quantization error (MQE) is defined as the distance between the input feature vector and the weight vector of the BMU.. Therefore, the testing features can be labeled by finding the BMU in the trained map as “hit points.” The failure modes can be identified by the location of the hit points on the map., The first method is to determine which features were highly correlated with the output. The values of correlation coefficient r were calculated and ranked in descending order. The features with the corresponding higher r values were selected as the input to the SOM., Roller bearing failure modes generally include roller failure, inner-race failure, outer-race failure, and a combination of these failures. The presence of different failure modes may cause different patterns of contact forces as the bearing rotates, which cause sinusoidal vibrations. If the confidence values predicted drop to a very low level, a very interesting task is trying to determine what kind of failure the bearing has developed. The SOM method described herein was employed for diagnosis for bearings., After training the SOM, a health map was obtained, which showed eight areas indicating the normal status, roller defect, inner-race defect, outer-race defect, outer-race & roller defect, outer-race & inner-race defect, inner-race & roller-defect and outer-race & inner-race & roller defect, respectively. With new data coming in, their extracted features were fed into the trained SOM, and their “hit points' on the health map represented the failure mode of the bearing., wherein, a ML-based anomaly detection framework diagnoses the cause of component status changes (e.g., degradation anomaly) through implementation of a SOM which, after having mapped feature vectors for normal operation to particular elements of the SOM, determines the best mapping from a new feature vector (having the anomaly) to a particular “hit point” in that SOM which corresponds to particular functional components (for which the MQE is the corresponding metric of deviation for that particular component) and wherein this diagnostic process involves both classification (i.e., the labeling of the anomaly according to the hit points) and correlation (used to both determine the hit point with features/weights most similar to the anomalous feature vector such as in the form of the Euclidean distance metric, but also for determining the most correlated/relevant particular features to the particular hit point in the SOM), and wherein it is noted that only one of the three RCA techniques is required by the claim.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to execute root cause analysis to determine a cause of an anomaly using one or more of classification, correlation, and RL-based action tracking for fault cause inference.  The modification would have been obvious because one of ordinary skill would have been motivated to mitigate failure costs in a facility by using an ML-based adaptive anomaly detection framework with prognostic models that identifies and communicates to users a confidence-based level of degradation along with diagnostic information that specifies that nature/source/cause of the anomaly  (Lee, [0006, 0007, 0017]).

In regards to claim 7, the rejection of claim 6 is incorporated and Li does not further teach wherein the alert comprises a cause determined for the at least one anomaly.  Although Li teaches the generation of indications of anomalies, he does not disclose a root cause analysis for those anomalies.  
However, Lee, in the analogous environment of designing ML-based multi-variate time series anomaly detection frameworks, teaches  wherein the alert comprises a cause determined for the at least one anomaly. ([0101, 0105, 0110, Claim 22, Figure 1] The SOM method described herein was employed for diagnosis for bearings. The results were a “health map' which showed different failure modes of the bearing., After training the SOM, a health map was obtained, which showed eight areas indicating the normal status, roller defect, inner-race defect, outer-race defect, outer-race & roller defect, outer-race & inner-race defect, inner-race & roller-defect and outer-race & inner-race & roller defect, respectively. With new data coming in, their extracted features were fed into the trained SOM, and their “hit points' on the health map represented the failure mode of the bearing., When implemented on a microprocessor based system, a microprocessor executes the above-mentioned processes (e.g., extracting features, decomposing data, selecting a prediction model, generating a predicted feature space, generating a confidence value, providing a status of mechanical system based at least in part on the generated data, etc.), interfacing with memory (e.g., local and/or remote via wired and/or wireless communications) such as for retrieving and storing the processes, results, and data (e.g., measurement data, mechanical system data, prediction models, reinforcement learning model, etc.), interfacing with a display for providing status, selection choices, data, and results, and interfacing with user interface(s) for receiving input (e.g., selection, navigation, etc.)., Claim 22. A method as claimed in claim 21 wherein providing the mechanical system diagnosis further comprises inputting features into a trained self-organizing map to generate and display a health map., wherein, a ML-based anomaly detection framework determines the degradation status of components in that system (through the analysis of time series data) and sends that information (alerts) to a display for communicating that information to users such that this information includes not only degradation status and confidence values indicative of the state of the system, but also of the particular cause of the degradation (e.g., component or process-specific) as represented by and discerned from the health map (formed by the SOM as a diagnosis function to represent the status/confidence value information – Figure 1).)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to transmit an  alert to users that comprises a cause determined for the at least one anomaly.  The modification would have been obvious because one of ordinary skill would have been motivated to mitigate failure costs in a facility by using an ML-based adaptive anomaly detection framework with prognostic models that identifies and communicates to users a confidence-based level of degradation along with diagnostic information that specifies that nature/source/cause of the anomaly  (Lee, [0006, 0007, 0017]).

In regards to claim 8, the rejection of claim 1 is incorporated and Li further teaches wherein the two or more time-series data sequences are generated by one or more Internet- of-Things (IoT) devices located within the physical environment.  ([p. 1, Section 1, p. 7, Section 4.1.1, p. 8, Section 4.2] Today’s Cyber-Physical Systems (CPSs) such as smart buildings, factories, power plants, and data centres are large, complex, and affixed with networked sensors and actuators that generate substantial amounts of multivariate time series data that can be used to continuously monitor the CPS’ working conditions to detect anomalies in time [1] so that the operators can take actions to investigate and resolve the underlying issues. The ubiquitous use of networked sensors and actuators in CPSs and other systems (e.g. autonomous vehicles) will become even more prevalent with the emergence of the Internet of Things (IoT), leading to multiple systems and devices communicating and possibly operating a large variety of tasks autonomously over networks., Generally, the attacked points include sensors (e.g., water level sensors, flow-rate meter, etc.) and actuators (e.g., valve, pump, etc.)., At the same time, due to the false low water level state in the raw water tank, the water supply from P1 to P2 was cut off while P2 continued to supply water to the consumer tanks. Thus, water levels of tanks in P2 decreased. Once the water level in the elevated tanks (P2) reached a low level, the supply to consumer tanks (P2) was cut off. Consequently, by tampering the readings of water level sensor in P1 to a low level, there would be an overflow in the tanks of P1 and no water flow in P2., wherein the ML anomaly detection framework is applied to networked (IoT) sensors that are monitoring the state of an industrial process.) 
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li to incorporate the teachings of Lee for the same reasons as pointed out for claim 1.

Claim 9 is also rejected because it is just a CRM implementation of the same subject matter of claim 1 which can be found in Li and Lee. It is noted that claim 9 also recites a processor with memory and executable instructions which are found in Li (e.g. [p. 15, Section 6, Figure 1, Algorithm 1] We proposed a novel MAD-GAN (Multivariate Anomaly Detection with GAN) framework to train LSTM-RNNs on the multivariate time-series data and then utilize both the discriminator and the generator to detect anomalies using a novel Discrimination and Reconstruction Anomaly Score (DR-Score). We tested MAD- GAN on two complex cyber-attack CPS datasets from the Secure Water Treatment Testbed (SWaT) and Water Distribution System (WADI), and showed superior performance over existing unsupervised detection methods, including a GAN-based approach.).

Claim 11/9 is also rejected because it is just a CRM implementation of the same subject matter of claim 3/1 which can be found in Li and Lee.

Claim 12/9 is also rejected because it is just a CRM implementation of the same subject matter of claim 4/1 which can be found in Li and Lee.

Claim 13/12 is also rejected because it is just a CRM implementation of the same subject matter of claim 5/4 which can be found in Li and Lee.

Claim 14/9 is also rejected because it is just a CRM implementation of the same subject matter of claim 6/1 which can be found in Li and Lee.

Claim 15/14 is also rejected because it is just a CRM implementation of the same subject matter of claim 7/6 which can be found in Li and Lee.

Claim 16/9 is also rejected because it is just a CRM implementation of the same subject matter of claim 8/1 which can be found in Li and Lee.

Claim 17 is also rejected because it is just a CRM implementation of the same subject matter of claim 1 which can be found in Li and Lee. It is noted that claim 17 also recites a processor with memory and executable instructions which are found in Li (e.g. [p. 15, Section 6, Figure 1, Algorithm 1] We proposed a novel MAD-GAN (Multivariate Anomaly Detection with GAN) framework to train LSTM-RNNs on the multivariate time-series data and then utilize both the discriminator and the generator to detect anomalies using a novel Discrimination and Reconstruction Anomaly Score (DR-Score). We tested MAD- GAN on two complex cyber-attack CPS datasets from the Secure Water Treatment Testbed (SWaT) and Water Distribution System (WADI), and showed superior performance over existing unsupervised detection methods, including a GAN-based approach.).

Claim 19/17 is also rejected because it is just a CRM implementation of the same subject matter of claim 3/1 which can be found in Li and Lee.

Claim 20/17 is also rejected because it is just a CRM implementation of the same subject matter of claim 4/1 which can be found in Li and Lee.

Claim 21/20 is also rejected because it is just a CRM implementation of the same subject matter of claim 5/4 which can be found in Li and Lee.

Claim 22/17 is also rejected because it is just a CRM implementation of the same subject matter of claim 6/1 which can be found in Li and Lee.

Claim 23/22 is also rejected because it is just a CRM implementation of the same subject matter of claim 7/6 which can be found in Li and Lee.

Claim 24/17 is also rejected because it is just a CRM implementation of the same subject matter of claim 8/1 which can be found in Li and Lee.

Claims 2, 10, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Li, in view of Lee, and in further view of Lee et al. (“Time Series Segmentation through Automatic Feature Learning”, https://https://arxiv.org/abs/1801.05394, arXiv:1801.05394v2 [cs.LG], 26 Jan 2018, pp. 1-13), hereinafter referred to as Lee2.

In regards to claim 2, the rejection of claim 1 is incorporated and Li and Lee do not further teach, wherein executing automated time-series process segmentation comprises processing each of the two or more time-series data sequences using a deep learning (DL) model provided as one of a bidirectional long short-term memory (LSTM) sequence classifier using supervised learning, and an autoencoder using unsupervised learning.  Although Li attempts to optimize the segmentation process of the time series, he does not teach the use of BiLSTM or autoencoder methods for performing that function. Lee does not disclose a segmentation operation.
However, Lee2, in the analogous art of performing segmentation for time series analysis, teaches wherein executing automated time-series process segmentation comprises processing each of the two or more time-series data sequences using a deep learning (DL) model provided as one of a bidirectional long short-term memory (LSTM) sequence classifier using supervised learning, and an autoencoder using unsupervised learning ([p. 4, Section 3.1, p. 4, Section 3.2, p. 6 Section 3.3, p. 7, Section 4.2, Figure 2, Algorithm 1] For a given time series, consisting of Nc channels (such as different sensors in an IoT system) acrossT timestamps, the input data matrix IDM ∈ R Nc×T is a real matrix where IDM(i, j) is the measurement recorded by the i-th channel at the j-th timestamp. To fully explore the temporal characteristics of the data, we follow common practice and partition it into a series of segments according to a user-specified time window size, Nw . For the t-th (t = 1, 2, · · · ,T/Nw ) window, we stack all the recordings within it to form a column vector which is denoted by st ∈ R Nc Nw ×1., The autoencoder model is usually trained by the back-propagation techniques [8] in an unsupervised manner, aiming at minimizing the error of the reconstructed results from the original inputs., Based on the computed distance of {Distt } T /Nw t=1 in Eq. 10, we construct a distance curve and select all the peaks (localmaximal) in the curve as breakpoints detected by our approach (see details in Figure 2). We summarize our overall process for automatic breakpoint detection in Algorithm 1. It is worth noting that our approach can be broadly applied for general changepoint detection even outside of its application to breakpoint detection as considered in this paper., The UCI human activity recognition data set [3] contains activity mode recordings carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities wearing a smartphone (Samsung Galaxy S2) on their waist. Using its embedded accelerometer and gyroscope, it captures 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. e experiments are video-recorded to generate labels manually., wherein an autoencoder-based deep learning framework determines (algorithm 1) the change-point of a time series for the purpose of segmenting that time series (i.e., determination of subsequences given an initial set of multi-series time series samples over a set of windows as shown in Figure 1) according to those change-points with this segmentation process automatically learning (with unsupervised learning without labeled data with human-labeled change-points) and extracting the features that best represent the time series data (such as for multi-sensor recognition of activity) and wherein it is noted that only one of the two segmentation techniques is required by the claim.)
It would have been prima facie obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to have modified Li and Lee to incorporate the teachings of Lee2 to perform automated time-series process segmentation by processing each of the two or more time-series data sequences using a deep learning (DL) model provided as one of a bidirectional long short-term memory (LSTM) sequence classifier using supervised learning, and an autoencoder using unsupervised learning.  The modification would have been obvious because one of ordinary skill would have been motivated to achieve improved and scalable segmentation in a generalizable deep learning autoencoder framework that learns the change-points of the time series and that learns and extract the most useful features of the time series for a broad set of applications such as for recognition-related problems involving multiple sensors (Lee2, [pp. 1-2, Section 1, p. 10, Section 4.4, p. 11, Section 5, Table 1])

Claim 10/9 is also rejected because it is just a CRM implementation of the same subject matter of claim 2/1 which can be found in Li, Lee, and Lee2.

Claim 18/17 is also rejected because it is just a CRM implementation of the same subject matter of claim 2/1 which can be found in Li, Lee, and Lee2.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. 

Tootooni et al. (“A Spectral Graph Theoretic Approach for Monitoring Multivariate Time Series Data From Complex Dynamical Processes”, IEEE Transactions on Automation Science and Engineering, Vol. 15, No., 1, January 2018, pp. 127-144) teach adaptive process monitoring techniques with modelling of time series subsequences corresponding to different states/phases of the system being monitored.


THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 

Any inquiry concerning this communication or earlier communications from the examiner should be directed to ROBERT LEWIS KULP whose telephone number is (571)272-7983. The examiner can normally be reached M, Th, F 8-5:30; Tu 8-3.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang, can be reached on 571-270-7092. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/ROBERT LEWIS KULP/Examiner, Art Unit 2124                                                                                                                                                                                                        
/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124