Detailed Action
This action is in response to claims filed November 18, 2021 for application 15/876,025 filed January 19, 2018. Claims 1, 2, 5, 10, 13, 14, 16, and 20 are amended. Claims 1-20 are pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 13, 20 are rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1).

Regarding claim 1, 
Singh teaches a method comprising: detecting, in a detector component of the server, an anomaly in the time series data comprising an outlier on an edge of the time series data by comparing a predicted value of the event to an actual value of the event using a selected forecasting model; ("In this thesis we developed an anomaly detection method for temporal data using LSTM networks." (p.47 para.1); "In [33] LSTMs are employed to detect collective anomalies in network security domain. An LSTM RNN with a single recurrent layer is used as a prediction model." (p.19 para.3); [“One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para.2); “The anomaly detection algorithm used in the project consists of two main steps. First, a summary prediction model is built to learn normal time series patterns and predict future time series. Then anomaly detection is performed by computing anomaly scores from the prediction errors.” (p.20 para.1); “We use LSTM RNN as the time series prediction model.” (p.20 para.2)). 
declaring the event to be an anomaly at a particular time if a difference between the predicted value and actual value exceed a defined threshold based on residual values for other devices of the network; ("A prediction error greater than a set limit indicates a point anomaly." (p.19 para. 2); "The difference between the true value and predicted value is also referred to as the error or residual." (p.8 para.3); "One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para. 2)).
analyzing, in an analyzer component of the server, all events for all devices of the network within a defined time proximity of the particular time of the anomaly to filter usual events and rank each event relative to the anomaly; (“On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly. A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible. A separate test set is used to evaluate the model." (p.21 para.2); "Evaluation: The results of anomaly detection on set T are shown in figure 4.6. With a threshold of −23 the model detects all three anomalies. These results are similar to the results in [40], which finds the three most unusual sequences in this data. Furthermore, the rankings of the discords match the order of log PD values with a lower value indicating a more unusual sequence." (p.33 para.3); “Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).
displaying to a user, through a graphical user interface of a client computer of the network, a labeled chart of the time series data showing the anomaly in a graphical context relative to all the events, (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).
Singh does not explicitly teach wherein the chart comprises an interactive chart with a label providing an interface accessing information about each event, the information including description, data source, and time of event; and displaying an event description display area listing textual information for each event in the time series data to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly.
However, Muddu teaches wherein the chart comprises an interactive chart (Muddu Fig. 39A) with a label providing an interface accessing information about each event, (Muddu Fig. 39A; Fig. 39B: element 3902; “[0452] “By clicking on the “Views” tab 3902, as shown in FIG. 39B, a GUI user can toggle the GUI between a “Threats” view 3906, “Anomalies” view 3907, “Users” view 3908, “Devices” view 3909, and “Applications” view 3910. As described in further detail below, the “Threats” view 3906 provides a listing of all active threats and the “Anomalies” view 3907 provides a listing of all anomalies.”).
the information including description, data source, and time of event; (Muddu Fig. 39B; Fig. 45A; Fig. 45B; Fig. 45C; [0480] “FIG. 45A provides an example view that the GUI generates when a GUI user selects the Threats view 3906 in FIG. 39B. The Threats Table view 4500 provides a Threats Trend timeline 4510 and a Threats listing 4520. The Threats Trend 4510 illustrates the number of threats over a period of time. This can be provided as a line chart, as shown in FIG. 45A. As alternatives, the same information can be re-formatted as a column chart, as shown in FIG. 45B, or as a breakdown column chart as shown in FIG. 45C.”) and 
displaying an event description display area listing textual information for each event in the time series data (Muddu Fig. 39A; Fig. 46A; Fig. 46B; Fig. 46C) to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly. (Muddu Fig. 39A; Fig. 39B; Fig. 46A; Fig. 46B; Fig. 46C; [0454] “The home screen view 3900 can additionally include summary charts and illustrations, such as, as shown in FIG. 39A, a “Threats by Threat Type” box 3912, a “Latest Threats” box 3913, and an “Events Trend” graphic 3914. The “Threats by Threat Type” box 3912 compares by number each different type of threat that has been identified. The listing in the “Latest Threats” box 3913 identifies the most recent threats by date. The “Events Trend” graphic 3914 is a timeline showing the volume of events along a timeline.”; [0485] “FIG. 46A provides an example view that the GUI generates when a GUI user selects the Anomalies view 3907 in FIG. 39B. The Anomalies table 4600 provides an Anomalies Trend timeline 4610 and an Anomalies listing 4620. The Anomalies Trend 4610 illustrates the number of anomalies over a period of time. This can be provided as a line chart, as shown in FIG. 46A. As alternatives, the same information can be re-formatted as a column chart, or as a breakdown column chart (not shown), analogous to the Threat Trend as shown in FIGS. 45A-45C.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh with an interactive chart displaying textual information related to the event as taught in Muddu. The motivation to do so is that the interactive action of the user can improve the model. (Muddu [0151] “in addition to automatically taking action based on the discovered anomalies and threats, the decisions by the user (e.g., that the anomalies and threats are correctly diagnosed, or that the discovered anomalies and threats are false positives) can then be provided as feedback data in order to update and improve the models.”).
Singh/Muddu, however, does not explicitly teach a method of identifying significant events for finding a root cause of an anomaly in a network having a server computer, comprising: collecting time series data for events for each device of the network; 
Heimann, which is analogous art from the same field of endeavor, teaches a method of identifying significant events for finding a root cause of an anomaly in a network having a server computer, ("FIG. 1 is a network 10 according to an embodiment of the invention. Network 10 may be any network, including a public network such as the Internet, a private network such as a home or enterprise network, or a combination thereof. One or more computing devices 20 (e.g., 20a, 20b, 20c) may connect to one another and to other computers through network 10. For example, computing devices 20 may include user devices such as personal computers, smartphones, tablets, etc.; servers; switches; routers; and/or any other device capable of network communications. At least one platform server 100 may also connect to network 10. Platform server 100 may comprise one or more computers configured to provide the CALI platform described herein." (col.2 ln.41-53); "FIG. 4 is a platform process 400 according to an embodiment of the invention. Platform 300, as executed by the at least one platform server 100, may perform process 400 to identify cybersecurity threats and/or other events of interest." (col.6 ln.16-19)). comprising: collecting time series data for events for each device of the network; (“To analyze cybersecurity threats, an analysis module of a processor may receive log data from at least one network node.” (Abstract); "In 402, platform 300 may receive log data (e.g., from devices 20 through network 10)" (col.6 ln.20-21);
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the invention of Singh/ Muddu in view of Heimann to collect data for devices of the network with a server in order to detect anomalies. The combination would be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e., anomaly detection using LSTM RNN as the time series prediction model) as disclosed in Singh to the network with one or more computing devices and a server (i.e., fig. 1) as disclosed in Heimann. 

Regarding claim 13, 
Singh teaches a system comprising: a detector component of the server detecting an anomaly in the time series data comprising an outlier on an edge of the time series data by comparing a predicted value of the event to an actual value of the event using a selected forecasting model, ("In this thesis we developed an anomaly detection method for temporal data using LSTM networks." (p.47 para.1); "In [33] LSTMs are employed to detect collective anomalies in network security domain. An LSTM RNN with a single recurrent layer is used as a prediction model." (p.19 para.3); [“One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para.2); “The anomaly detection algorithm used in the project consists of two main steps. First, a summary prediction model is built to learn normal time series patterns and predict future time series. Then anomaly detection is performed by computing anomaly scores from the prediction errors.” (p.20 para.1); “We use LSTM RNN as the time series prediction model.” (p.20 para.2)). and
declaring the event to be an anomaly at a particular time if a difference between the predicted value and actual value exceed a defined threshold based on residual values for other devices of the network; ("A prediction error greater than a set limit indicates a point anomaly." (p.19 para. 2); "The difference between the true value and predicted value is also referred to as the error or residual." (p.8 para.3); "One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para. 2)).
an analyzer component of the server analyzing all events for all devices of the network within a defined time proximity of the particular time of the anomaly to filter usual events and rank each event relative to the anomaly; (“On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly. A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible. A separate test set is used to evaluate the model." (p.21 para.2); "Evaluation: The results of anomaly detection on set T are shown in figure 4.6. With a threshold of −23 the model detects all three anomalies. These results are similar to the results in [40], which finds the three most unusual sequences in this data. Furthermore, the rankings of the discords match the order of log PD values with a lower value indicating a more unusual sequence." (p.33 para.3); “Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)). and
a graphical user interface functionally coupled to a client computer of the network displaying a labeled chart of the time series data showing the anomaly in a graphical context relative to all the events, (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).
Singh does not explicitly teach wherein the chart comprises an interactive chart with a label providing an interface accessing information about each event, the information including description, data source, and time of event, the graphical user interface further displaying an event description display area listing textual information for each event in the time series data to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly.
However, Muddu teaches wherein the chart comprises an interactive chart (Muddu Fig. 39A) with a label providing an interface accessing information about each event, (Muddu Fig. 39A; Fig. 39B: element 3902; “[0452] “By clicking on the “Views” tab 3902, as shown in FIG. 39B, a GUI user can toggle the GUI between a “Threats” view 3906, “Anomalies” view 3907, “Users” view 3908, “Devices” view 3909, and “Applications” view 3910. As described in further detail below, the “Threats” view 3906 provides a listing of all active threats and the “Anomalies” view 3907 provides a listing of all anomalies.”).
the information including description, data source, and time of event, (Muddu Fig. 39B; Fig. 45A; Fig. 45B; Fig. 45C; [0480] “FIG. 45A provides an example view that the GUI generates when a GUI user selects the Threats view 3906 in FIG. 39B. The Threats Table view 4500 provides a Threats Trend timeline 4510 and a Threats listing 4520. The Threats Trend 4510 illustrates the number of threats over a period of time. This can be provided as a line chart, as shown in FIG. 45A. As alternatives, the same information can be re-formatted as a column chart, as shown in FIG. 45B, or as a breakdown column chart as shown in FIG. 45C.”) and 
the graphical user interface further displaying an event description display area listing textual information for each event in the time series data (Muddu Fig. 39A; Fig. 46A; Fig. 46B; Fig. 46C) to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly. (Muddu Fig. 39A; Fig. 39B; Fig. 46A; Fig. 46B; Fig. 46C; [0454] “The home screen view 3900 can additionally include summary charts and illustrations, such as, as shown in FIG. 39A, a “Threats by Threat Type” box 3912, a “Latest Threats” box 3913, and an “Events Trend” graphic 3914. The “Threats by Threat Type” box 3912 compares by number each different type of threat that has been identified. The listing in the “Latest Threats” box 3913 identifies the most recent threats by date. The “Events Trend” graphic 3914 is a timeline showing the volume of events along a timeline.”; [0485] “FIG. 46A provides an example view that the GUI generates when a GUI user selects the Anomalies view 3907 in FIG. 39B. The Anomalies table 4600 provides an Anomalies Trend timeline 4610 and an Anomalies listing 4620. The Anomalies Trend 4610 illustrates the number of anomalies over a period of time. This can be provided as a line chart, as shown in FIG. 46A. As alternatives, the same information can be re-formatted as a column chart, or as a breakdown column chart (not shown), analogous to the Threat Trend as shown in FIGS. 45A-45C.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh with an interactive chart displaying textual information related to the event as taught in Muddu. The motivation to do so is that the interactive action of the user can improve the model. (Muddu [0151] “in addition to automatically taking action based on the discovered anomalies and threats, the decisions by the user (e.g., that the anomalies and threats are correctly diagnosed, or that the discovered anomalies and threats are false positives) can then be provided as feedback data in order to update and improve the models.”).
However, Singh/ Muddu does not explicitly teach a system of identifying significant events for finding a root cause of an anomaly in a network having a server computer, comprising: a data collector collecting time series data for events for each device of the network;
Heimann, which is analogous art from the same field of endeavor, teaches a system of identifying significant events for finding a root cause of an anomaly in a network having a server computer, ("FIG. 1 is a network 10 according to an embodiment of the invention. Network 10 may be any network, including a public network such as the Internet, a private network such as a home or enterprise network, or a combination thereof. One or more computing devices 20 (e.g., 20a, 20b, 20c) may connect to one another and to other computers through network 10. For example, computing devices 20 may include user devices such as personal computers, smartphones, tablets, etc.; servers; switches; routers; and/or any other device capable of network communications. At least one platform server 100 may also connect to network 10. Platform server 100 may comprise one or more computers configured to provide the CALI platform described herein." (col.2 ln.41-53); "FIG. 4 is a platform process 400 according to an embodiment of the invention. Platform 300, as executed by the at least one platform server 100, may perform process 400 to identify cybersecurity threats and/or other events of interest." (col.6 ln.16-19)). comprising: a data collector collecting time series data for events for each device of the network; (“To analyze cybersecurity threats, an analysis module of a processor may receive log data from at least one network node.” (Abstract); "In 402, platform 300 may receive log data (e.g., from devices 20 through network 10)" (col.6 ln.20-21);
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the invention of Singh/ Muddu in view of Heimann to collect data for devices of the network with a server in order to detect anomalies. The combination would be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e., anomaly detection using LSTM RNN as the time series prediction model) as disclosed in Singh to the network with one or more computing devices and a server (i.e., fig. 1) as disclosed in Heimann. 

Regarding claim 20, 
Singh teaches a method comprising: detecting, in a detector component of the server, an anomaly in the time series data comprising an outlier on an edge of the time series data by comparing a predicted value of the event to an actual value of the event using a selected forecasting model; ("In this thesis we developed an anomaly detection method for temporal data using LSTM networks." (p.47 para.1); "In [33] LSTMs are employed to detect collective anomalies in network security domain. An LSTM RNN with a single recurrent layer is used as a prediction model." (p.19 para.3); [“One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para.2); “The anomaly detection algorithm used in the project consists of two main steps. First, a summary prediction model is built to learn normal time series patterns and predict future time series. Then anomaly detection is performed by computing anomaly scores from the prediction errors.” (p.20 para.1); “We use LSTM RNN as the time series prediction model.” (p.20 para.2)). 
declaring the event to be an anomaly at a particular time if a difference between the predicted value and actual value exceed a defined threshold based on residual values for other devices of the network; ("A prediction error greater than a set limit indicates a point anomaly." (p.19 para. 2); "The difference between the true value and predicted value is also referred to as the error or residual." (p.8 para.3); "One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (p.18 para. 2)).
analyzing, in an analyzer component of the server, all events for all devices of the network within a defined time proximity of the particular time of the anomaly to filter usual events and rank each event relative to the anomaly; (“On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly. A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible. A separate test set is used to evaluate the model." (p.21 para.2); "Evaluation: The results of anomaly detection on set T are shown in figure 4.6. With a threshold of −23 the model detects all three anomalies. These results are similar to the results in [40], which finds the three most unusual sequences in this data. Furthermore, the rankings of the discords match the order of log PD values with a lower value indicating a more unusual sequence." (p.33 para.3); “Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)). and
displaying to a user, through a graphical user interface of a client computer of the network, a labeled chart of the time series data showing the anomaly in a graphical context relative to all the events, (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)). 
Singh does not explicitly teach wherein the chart comprises an interactive chart with a label providing an interface accessing information about each event, the information including description, data source, and time of event; and displaying an event description display area listing textual information for each event in the time series data to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly.
However, Muddu teaches wherein the chart comprises an interactive chart (Muddu Fig. 39A) with a label providing an interface accessing information about each event, (Muddu Fig. 39A; Fig. 39B: element 3902; “[0452] “By clicking on the “Views” tab 3902, as shown in FIG. 39B, a GUI user can toggle the GUI between a “Threats” view 3906, “Anomalies” view 3907, “Users” view 3908, “Devices” view 3909, and “Applications” view 3910. As described in further detail below, the “Threats” view 3906 provides a listing of all active threats and the “Anomalies” view 3907 provides a listing of all anomalies.”).
the information including description, data source, and time of event; (Muddu Fig. 39B; Fig. 45A; Fig. 45B; Fig. 45C; [0480] “FIG. 45A provides an example view that the GUI generates when a GUI user selects the Threats view 3906 in FIG. 39B. The Threats Table view 4500 provides a Threats Trend timeline 4510 and a Threats listing 4520. The Threats Trend 4510 illustrates the number of threats over a period of time. This can be provided as a line chart, as shown in FIG. 45A. As alternatives, the same information can be re-formatted as a column chart, as shown in FIG. 45B, or as a breakdown column chart as shown in FIG. 45C.”) and 
displaying an event description display area listing textual information for each event in the time series data (Muddu Fig. 39A; Fig. 46A; Fig. 46B; Fig. 46C) to provide a historical context of events allowing personnel to assess actual network conditions surrounding the events and the anomaly. (Muddu Fig. 39A; Fig. 39B; Fig. 46A; Fig. 46B; Fig. 46C; [0454] “The home screen view 3900 can additionally include summary charts and illustrations, such as, as shown in FIG. 39A, a “Threats by Threat Type” box 3912, a “Latest Threats” box 3913, and an “Events Trend” graphic 3914. The “Threats by Threat Type” box 3912 compares by number each different type of threat that has been identified. The listing in the “Latest Threats” box 3913 identifies the most recent threats by date. The “Events Trend” graphic 3914 is a timeline showing the volume of events along a timeline.”; [0485] “FIG. 46A provides an example view that the GUI generates when a GUI user selects the Anomalies view 3907 in FIG. 39B. The Anomalies table 4600 provides an Anomalies Trend timeline 4610 and an Anomalies listing 4620. The Anomalies Trend 4610 illustrates the number of anomalies over a period of time. This can be provided as a line chart, as shown in FIG. 46A. As alternatives, the same information can be re-formatted as a column chart, or as a breakdown column chart (not shown), analogous to the Threat Trend as shown in FIGS. 45A-45C.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh with an interactive chart displaying textual information related to the event as taught in Muddu. The motivation to do so is that the interactive action of the user can improve the model. (Muddu [0151] “in addition to automatically taking action based on the discovered anomalies and threats, the decisions by the user (e.g., that the anomalies and threats are correctly diagnosed, or that the discovered anomalies and threats are false positives) can then be provided as feedback data in order to update and improve the models.”).
Singh/ Muddu, however, does not explicitly teach a computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to perform a method of identifying significant events for finding a root cause of an anomaly in a network having a server computer, the method comprising: 
Heimann, which is analogous art from the same field of endeavor, teaches a computer program product, comprising a non-transitory computer-readable medium having a computer-readable program code embodied therein, the computer-readable program code adapted to be executed by one or more processors to perform a method of identifying significant events for finding a root cause of an anomaly in a network having a server computer, ("Computer-readable medium 210 may include various instructions for implementing an operating system 214 (e.g., Mac OS®, Windows®, Linux). The operating system 214 may comprise any or all of: multi-user, multiprocessing, multitasking, multithreading, real-time, and the like." (fig. 2; col.3 ln.16-20); "FIG. 1 is a network 10 according to an embodiment of the invention. Network 10 may be any network, including a public network such as the Internet, a private network such as a home or enterprise network, or a combination thereof. One or more computing devices 20 (e.g., 20a, 20b, 20c) may connect to one another and to other computers through network 10. For example, computing devices 20 may include user devices such as personal computers, smartphones, tablets, etc.; servers; switches; routers; and/or any other device capable of network communications. At least one platform server 100 may also connect to network 10. Platform server 100 may comprise one or more computers configured to provide the CALI platform described herein." (col.2 ln.41-53); "FIG. 4 is a platform process 400 according to an embodiment of the invention. Platform 300, as executed by the at least one platform server 100, may perform process 400 to identify cybersecurity threats and/or other events of interest." (col.6 ln.16-19)). comprising: collecting time series data for events for each device of the network; (“To analyze cybersecurity threats, an analysis module of a processor may receive log data from at least one network node.” (Abstract); "In 402, platform 300 may receive log data (e.g., from devices 20 through network 10)" (col.6 ln.20-21);
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify the invention of Singh/ Muddu in view of Heimann to collect data for devices of the network with a server in order to detect anomalies. The combination would be obvious because a person of ordinary skill in the art would know to apply a known technique (i.e., anomaly detection using LSTM RNN as the time series prediction model) as disclosed in Singh to the network with one or more computing devices and a server (i.e., fig. 1, fig.2) as disclosed in Heimann. 


Claims 2, 3, 12, 14 are rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1) in view of Husain (US20190122119 A1).

Regarding claim 2, Singh/ Muddu/ Heimann teach claim 1. 
Heimann further teaches the method of claim 1 wherein the time series data comprises near real-time data as transaction log information written to a central data store, ("Computer-readable medium 210 may include various instructions for implementing an operating system 214 (e.g., Mac OS®, Windows®, Linux). The operating system 214 may comprise any or all of: multi-user, multiprocessing, multitasking, multithreading, real-time, and the like." (fig. 2; col.3 ln.16-20); "Training for UAAD may be performed offline using a data set composed of known-benign user agents, random string characters, and known-attack like user agents. The training data may include a mix of open source data and proprietary data gathered from research on customer data. Once the model is trained, platform 300 may use the model to classify new user agents in real-time without the need to retrain. The model may also be trained again as new relevant training data becomes available." (fig.3; col.14 ln.32-40)). and
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh with a real-time data. The motivation to do so is that “Once the model is trained, platform 300 may use the model to classify new user agents in real-time without the need to retrain.” (Heimann, col. 14, ln. 36-38).
Singh further teaches wherein the events comprise performance metrics of the device and network transactions to and from the device, ("Figure 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm." (p.30 para.1)). and
However, Singh/ Muddu/ Heimann does not explicitly teach further wherein the selected forecasting model is selected through a competition among plurality of different forecasting models and comprises a model having fewest errors on test data., but Husain teaches this limitation. (Fig. 1B; [0023] “the models illustrated in FIG. 1B represent neural networks that output a predicted a value of B given an input value of A.”; “The fitness function 140 may be an objective function that can be used to compare the models of the input set 120. In some examples, the fitness function 140 is based on a frequency and/or magnitude of errors produced by testing a model on the input data set 102”; [0054] “an overall fittest model of the last executed epoch may be selected and output as representing a neural network that best models the input data set 102.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh/ Muddu/ Heimann with a selected forecasting model by comparing the models based on the errors as taught in Husain. The motivation to do so is that selecting the fittest model can improve the performance of the trained model. (Husain, [0056] “Training trainable models generated by breeding the fittest models of an epoch may improve fitness of the trained models without requiring training of every model of an epoch.”). 

Regarding claim 3, Singh/ Muddu / Heimann/ Husain teach claim 2.
Singh further teaches the method of claim 2 wherein the analyzing further comprises: extracting relevant features from the log information; ("A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible. A separate test set is used to evaluate the model." (p.21 para.2)).
assigning a value to each feature of the relevant features; ("On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2)).
counting a number of occurrences for each feature value pair in their relative order. ("On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2)).

Regarding claim 12, Singh/ Muddu/ Heimann/ Husain teach claim 2.
Singh further teaches the method of claim 2 wherein the log information is collected by one of: an agent process embedded in each device of the network, or automatic status transmitting mechanisms native to each device. ("This dataset is taken from [39] and is available at Numenta’s GitHub repository3. The dataset contains temperature sensor readings of an internal component of a large industrial machine." (p.24 para.3)).

Regarding claim 14, Singh/ Muddu/ Heimann claim 13. 
Heimann further teaches the system of claim 13 wherein the time series data comprises near real-time data as transaction log information written to a central data store, ("Computer-readable medium 210 may include various instructions for implementing an operating system 214 (e.g., Mac OS®, Windows®, Linux). The operating system 214 may comprise any or all of: multi-user, multiprocessing, multitasking, multithreading, real-time, and the like." (fig. 2; col.3 ln.16-20); "Training for UAAD may be performed offline using a data set composed of known-benign user agents, random string characters, and known-attack like user agents. The training data may include a mix of open source data and proprietary data gathered from research on customer data. Once the model is trained, platform 300 may use the model to classify new user agents in real-time without the need to retrain. The model may also be trained again as new relevant training data becomes available." (fig.3; col.14 ln.32-40)). and
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh with a real-time data. The motivation to do so is that “Once the model is trained, platform 300 may use the model to classify new user agents in real-time without the need to retrain.” (Heimann, col. 14, ln. 36-38).
Singh further teaches wherein the events comprise performance metrics of the device and network transactions to and from the device, ("Figure 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm." (p.30 para.1)). and
However, Singh/ Muddu/ Heimann does not explicitly teach further wherein the selected forecasting model is selected through a competition among plurality of different forecasting models and comprises a model having fewest errors on test data., but Husain teaches this limitation. (Fig. 1B; [0023] “the models illustrated in FIG. 1B represent neural networks that output a predicted a value of B given an input value of A.”; “The fitness function 140 may be an objective function that can be used to compare the models of the input set 120. In some examples, the fitness function 140 is based on a frequency and/or magnitude of errors produced by testing a model on the input data set 102”; [0054] “an overall fittest model of the last executed epoch may be selected and output as representing a neural network that best models the input data set 102.”).
It would have been obvious to one of ordinary skill of the art before the effective filing date of the claimed invention to modify Singh/ Muddu/ Heimann with a selected forecasting model by comparing the models based on the errors as taught in Husain. The motivation to do so is that selecting the fittest model can improve the performance of the trained model. (Husain, [0056] “Training trainable models generated by breeding the fittest models of an epoch may improve fitness of the trained models without requiring training of every model of an epoch.”). 


Claims 4, 5, 11, 15, 16 are rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1) in view of Husain (US20190122119 A1) in view of Gopalakrishnan (US10911318 B2) in view of Veeramachaneni ("AI2: Training a big data machine to defend").

Regarding claim 4, Singh/ Muddu / Heimann/ Husain teach claim 3.
Singh further teaches the method of claim 3 wherein the analyzing comprises a Recurrent Neural Network (RNN) process … (“We use LSTM RNN as the time series prediction model. (p.20 para.2)) taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal. (The model takes as input the most recent p values and outputs q future values. We refer to parameters p, q as lookback and lookahead respectively." (p.20 para.2); "On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly. A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible." (p.21 para.2); "Since we use RNNs as the prediction model, we review recent work on using RNNs for temporal anomaly detection in this section. Stacked LSTM RNNs are used for anomaly detection in time series in [31]. The model takes only one time step as input and maintains LSTM state across the entire input sequence. The model is trained on normal data and made to predict multiple time steps. Thus each observation has multiple predictions made at different times in the past." (p.18 para.4 – p.19 para.1)).
Singh/ Muddu / Heimann/ Husain, however, does not explicitly teach the method of claim 3 wherein the analyzing comprises … Markov chain process taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal.
Gopalakrishnan, on the other hand, teaches the method of claim 3 wherein the analyzing comprises … Markov chain process (“In an embodiment, the primary predictor comprises a Hidden Markov Model.” (col.11 ln.45-46)) taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal.  (“In an embodiment, determining whether an anomaly is detected in the network time series data comprises determining the anomaly according to a Hidden Markov Model.” (col.11 ln.46-49); “In an embodiment, determining whether an anomaly is detected includes determining a likelihood that the primary predictor will accurately predict a next observed data value within a specified range of acceptable values.” (col.12 ln.31-34)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a known technique (i.e., “determining whether an anomaly is detected in the network time series data comprises determining the anomaly according to a Hidden Markov Model”, Gopalakrishnan) to take as input a time series of log events and provide as output a probability of a next event. 
Veeramachaneni further teaches the benefit of the combination of a Recurrent Neural Network (RNN) process and Markov chain process (“Multi-algorithm ensembles are combinations of predictions from different machine learning models. This strategy improves robustness by compensating for the individual biases of models in the ensemble. In this case, we average outlier probabilities obtained separately by each of the methods.” (p.9, left col, para.3)).
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a Recurrent Neural Network (RNN) process and Markov chain process by applying a known technique (i.e., “Multi-algorithm ensembles are combinations of predictions from different machine learning models.”,Veeramachaneni). The combination would have been obvious because a person of ordinary skill in the art would combine a Recurrent Neural Network (RNN) process and Markov chain process to improve the performance (i.e., “This strategy improves robustness by compensating for the individual biases of models in the ensemble”, Veeramachaneni) as disclosed in Veeramachaneni.

Regarding claim 5, Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni teach claim 4.
Singh further teaches the method of claim 4 further comprising: determining, for each of the RNN process …, distances between actual events and predicted events; (“Note: From here on we will use the terms RNN and LSTM RNN interchangeably to refer to an RNN with LSTM units.” (Singh, p.16 para. 2);
"We use LSTM RNN as the time series prediction model." (p.20 para.2); "One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (Singh, p.18 para. 2)). 
Gopalakrishnan further teaches the method of claim 4 further comprising: determining, for each of … Markov chain process, distances between actual events and predicted events; (“At block 108, an alternative predictor is used if an anomaly is detected. In an embodiment, an anomaly is detected using a HMM. … an anomaly is determined by comparing previous predictions determined according to the primary predictor with observed values and if the difference between the two exceeds a predetermined value, determining that an anomaly has occurred.” (Gopalakrishnan, col. 4, ln. 44-56)). and
Singh further teaches calculating a respective score for each log event of the time series of log events based on the distances to help determine a rarity of the next event. ("The difference between the true value and predicted value is also referred to as the error or residual." (p.8 para.3); "On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2); “Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).

Regarding claim 11, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni teach claim 4.
Singh further teaches the method of claim 4 wherein the RNN comprises a long short-term memory (LSTM) RNN network. ("We use LSTM RNN as the time series prediction model." (p.20 para.2)).

Regarding claim 15, Singh/ Muddu/ Heimann/ Husain teach claim 14.
Singh further teaches the system of claim 14 wherein the analyzer comprises a Recurrent Neural Network (RNN) process … (“We use LSTM RNN as the time series prediction model. (p.20 para.2))  taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal, (The model takes as input the most recent p values and outputs q future values. We refer to parameters p, q as lookback and lookahead respectively." (p.20 para.2); "On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly. A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible." (p.21 para.2); "Since we use RNNs as the prediction model, we review recent work on using RNNs for temporal anomaly detection in this section. Stacked LSTM RNNs are used for anomaly detection in time series in [31]. The model takes only one time step as input and maintains LSTM state across the entire input sequence. The model is trained on normal data and made to predict multiple time steps. Thus each observation has multiple predictions made at different times in the past." (p.18 para.4 – p.19 para.1)).
Singh/ Muddu / Heimann/ Husain, however, does not explicitly teach the system of claim 14 wherein the analyzer comprises …  Markov chain process taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal,
Gopalakrishnan, on the other hand, teaches the system of claim 14 wherein the analyzer comprises … Markov chain process (“In an embodiment, the primary predictor comprises a Hidden Markov Model.” (col.11 ln.45-46)) taking as input a time series of log events and providing as output a probability of a next event to occur or not occur to enable analysis of the next event as normal or not normal, (“In an embodiment, determining whether an anomaly is detected in the network time series data comprises determining the anomaly according to a Hidden Markov Model.” (col.11 ln.46-49); “In an embodiment, determining whether an anomaly is detected includes determining a likelihood that the primary predictor will accurately predict a next observed data value within a specified range of acceptable values.” (col.12 ln.31-34)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to use a known technique (i.e., “determining whether an anomaly is detected in the network time series data comprises determining the anomaly according to a Hidden Markov Model”, Gopalakrishnan) to take as input a time series of log events and provide as output a probability of a next event. 
Veeramachaneni teaches the benefit of the combination of a Recurrent Neural Network (RNN) process and Markov chain process (“Multi-algorithm ensembles are combinations of predictions from different machine learning models. This strategy improves robustness by compensating for the individual biases of models in the ensemble. In this case, we average outlier probabilities obtained separately by each of the methods.” (p.9, left col, para.3)). 
Therefore, it would have been obvious to one of ordinary skill in the art, before the effective filing date of the claimed invention, to combine a Recurrent Neural Network (RNN) process and Markov chain process by applying a known technique (i.e., “Multi-algorithm ensembles are combinations of predictions from different machine learning models.”,Veeramachaneni). The combination would have been obvious because a person of ordinary skill in the art would combine a Recurrent Neural Network (RNN) process and Markov chain process to improve the performance (i.e., “This strategy improves robustness by compensating for the individual biases of models in the ensemble”, Veeramachaneni) as disclosed in Veeramachaneni.
Singh further teaches further extracts relevant features from the log information, ("A validation set containing both normal data and anomalies is used to set a threshold on log PD values that can separate anomalies from normal observations and incur as few false positives as possible. A separate test set is used to evaluate the model." (p.21 para.2)).
Singh further teaches assigns a value to each feature of the relevant features, (["On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2)). and
Singh further teaches counts a number of occurrences for each feature value pair in their relative order. ("On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2)).

Regarding claim 16, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni teach claim 15.
Singh further teaches the system of claim 15 wherein the analyzer further determines, for each of the RNN process …, distances between actual events and predicted events, (“Note: From here on we will use the terms RNN and LSTM RNN interchangeably to refer to an RNN with LSTM units.” (Singh, p.16 para. 2); "We use LSTM RNN as the time series prediction model." (p.20 para.2); "One approach for temporal anomaly detection has been to build prediction models and use the prediction errors (the difference between the predicted values and the actual values) to compute an anomaly score [4]." (Singh, p.18 para. 2)). 
Gopalakrishnan further teaches the system of claim 15 wherein the analyzer further determines, for each of … Markov chain process, distances between actual events and predicted events, (“At block 108, an alternative predictor is used if an anomaly is detected. In an embodiment, an anomaly is detected using a HMM. … an anomaly is determined by comparing previous predictions determined according to the primary predictor with observed values and if the difference between the two exceeds a predetermined value, determining that an anomaly has occurred.” (Gopalakrishnan, col. 4, ln. 44-56)). and
Singh further teaches calculates a respective score for each log event of the time series of log events based on the distances to help determine a rarity of the next event. ("The difference between the true value and predicted value is also referred to as the error or residual." (p.8 para.3); "On new data, the log probability densities (PDs) of errors are calculated and used as anomaly scores: with lower values indicating a greater likelihood of the observation being an anomaly." (p.21 para.2); “Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).


Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1) in view of Husain (US20190122119 A1) in view of Gopalakrishnan (US10911318 B2) in view of Veeramachaneni ("AI2 : Training a big data machine to defend") in view of McMahon (US 9697469 B2).

Regarding claim 6, Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni teach claim 5.
Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni, however, do not explicitly teach the method of claim 5 further comprising combining the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process.
McMahon, on the other hand, teaches the method of claim 5 further comprising combining the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process. (“Learning techniques are selected for each of the plurality of datasets by a modeling engine, step 106. Choices include, but are not limited to support vector machines (SVMs), tree-based techniques, artificial neural networks, random forests and other supervised or unsupervised learning algorithms.” (fig. 1; col.6 ln.37-42); “Overall weighting of each model within each dataset may be determined. Each model set (SVMs with predictive power 420, 422, 424, 426, and 428) are transmitted to a prediction server/engine along with the weights of each model within each dataset and the number of examples in each feature set to form overall ensemble 440. Voting weights 430, 432, 434, 436, and 438 can be assigned to SVMs with predictive power 420, 422, 424, 426, and 428, respectively. The voting weights may be scaled to amount of data input into the model building (the number of examples used in a model). Relative weights of each of the sets of models may be determined based on the number of examples provided from the training data for each of the datasets.” (fig.4; col.13 ln.55 – col.14 ln.1)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., assigning weights and forming overall ensemble) as disclosed in McMahon to combine the RNN process and the Markov chain process. The combination would have been obvious because a person of ordinary skill in the art would combine the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process.


Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1) in view of Husain (US20190122119 A1) in view of Gopalakrishnan (US10911318 B2) in view of Veeramachaneni ("AI2 : Training a big data machine to defend") in view of McMahon (US 9697469 B2) in view of King (US 20180036591 A1) in view of Das (“Incorporating Expert Feedback into Active Anomaly Discovery).

Regarding claim 7, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon teach claim 6.
Heimann further teaches the method of claim 6 further comprising receiving user feedback of the respective score for each log event (“element 308” on fig.3; "Computerized Adaptive Detection (CAD) may comprise the combination of these two powerful components. Janus, based on feedback from the user (i.e., Oracle), may generate a loss function and automatically choose observations within the Analysis region of TailJumps to have the Oracle label with level of interest. This interaction may terminate when the loss function is minimized subject to constraints. Janus may minimize the loss function to find the optimal cutoff. FIG. 3 is an overview of the platform 300, as executed by at least one platform server 100, according to an embodiment of the invention. Unsupervised learning core 302 may use network-based behavioral analytics and/or user-based entity behavioral analytics to score observations and produce score events. Outlier detection algorithm(s) 304 may produce a filtered set of outliers. For example, filtration and output are illustrated in detail below in FIGS. 17-20 and accompanying description. AI engine 306 may produce machine curated results using the filtered set of outliers as input. For example, yellow dots indicate steps in a self-learning process, and blue dots indicate steps in an active learning process, explained in greater detail with respect to FIG. 21 and accompanying description. Example outputs of these processes are shown in FIG. 25 and accompanying description. Human curation interface 308 may allow users to feed knowledge back to AI engine 306 in the form of labels." (col.5 ln.57 – col.6 ln.15); 
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “Human curation interface 308 may allow users to feed knowledge back to AI engine 306 in the form of labels”, Heimann). 
However, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon do not explicitly teach wherein the coefficient weights are determined based on the user feedback using a simple machine learning model, and wherein the score comprises a numeric ranking within a defined range.
King, on the other hand, teaches wherein the coefficient weights are determined based on the user feedback using a simple machine learning model,  (“some embodiments may train other types of models, for instance, a Hidden Markov model or a recurrent neural net.” [¶0117]; “Embodiments may ask users to rate their workout instructor and associate those ratings with the user's profiles. This data may be used as a training set. Embodiments may then determine weights or coefficients of a model, for instance a neural net or decision tree, by iteratively adjusting the coefficients, determining how closely the model describes the training set, and then adjusting the weights or coefficients in a direction in which the correspondence is expected to increase. The resulting model may then receive as an input a current user's profile and, based on the trained weights and coefficients, a score or selecting of candidate workout instructors may be determined to identify the best fit for the user based on the training data. [¶0118]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “ask users to rate”, King). The combination would have been obvious because a person of ordinary skill in the art would determine weights or coefficient as disclosed in King.
Das, on the other hand, teaches wherein the score comprises a numeric ranking within a defined range. (“The anomaly score assigned to point xi is the mean negative log density” (p. 854, right col. para.1); "Internally, our model maintains a list of data instances ranked by the anomaly score produced by the anomaly detector." (p. 854, right col. para.4); "We want the scores of all labeled anomalies to be higher than qτ and the scores of all labeled nominals to be below qτ." (page. 854, right col. para.8).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “ranked by the anomaly scores”, Das). The combination would have been obvious because a person of ordinary skill in the art would rank the anomaly scores.


Claims 8-10, 17-19 are rejected under 35 U.S.C. 103 as being unpatentable over Singh ("Anomaly Detection for Temporal Data using Long Short-Term Memory (LSTM)") in view of Muddu (US20170063887 A1) in view of Heimann (US10685293 B1) in view of Husain (US20190122119 A1) in view of Gopalakrishnan (US10911318 B2) in view of Veeramachaneni ("AI2 : Training a big data machine to defend") in view of McMahon (US 9697469 B2) in view of King (US 20180036591 A1) in view of Das (“Incorporating Expert Feedback into Active Anomaly Discovery”) and further in view of Zimek (“Ensembles for Unsupervised Outlier Detection: Challenges and Research Questions”).

Regarding claim 8, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das teach claim 7.
	Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das, however, do not explicitly teach the method of claim 7 further comprising calculating an event score for each event by summing a weighted RNN score for an event with a weighted Markov chain score for the event.
	Zimek, on the other hand, teaches the method of claim 7 further comprising calculating an event score for each event by summing a weighted RNN score for an event with a weighted Markov chain score for the event. ("Combining outlier scores or rankings learned on different subsets of attributes, the so-called “feature bagging” was the first paper to explicitly discuss building ensembles for outlier detection [45]" (p.15, left col. para.3); "Combining outlier scores of different algorithms (i.e., combinations of different models of what constitutes an outlier) has been explored in several studies [54; 40; 63]." (p.15, right col. para.5)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., "Combining outlier scores of different algorithms”, Zimek). The combination would have been obvious because a person of ordinary skill in the art would combine scores from two different models.

Regarding claim 9, Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das/ Zimek teach claim 8.
Singh further teaches the method of claim 8 further comprising labeling the chart with an indexed label identifying each of the events and the anomaly in a contrasting visual manner. (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).

Regarding claim 10, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das/ Zimek teach claim 9.
Singh further teaches the method of claim 9 wherein the indexed label comprises an alphanumeric character superimposed proximate the events and anomaly. (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1); “Figure 4.3: Results of HTM on Machine Temperature Data. (a) shows the results on entire dataset. (b) shows the results corresponding to figure 4.1. (c) shows the results corresponding to figure 4.2. The X-axis shows time and the Y-axis shows the temperature. True anomalies are denoted by red markers. Green/red diamonds represent true/false positives. The pink shaded portions are anomaly windows. The purple shaded area in (a) is the testing window.” (p.32 para.1)). 

Regarding claim 17, Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni teach claim 16.
Singh/ Muddu/ Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni, however, do not explicitly teach the system of claim 16 wherein the analyzer combines the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process, and receives user feedback of the respective score for each log event, wherein the coefficient weights are determined based on the user feedback using a simple machine learning model, and wherein the score comprises a numeric ranking within a defined range, and calculates an event score for each event by summing a weighted RNN score for an event with a weighted Markov chain score for the event.
McMahon, on the other hand, teaches the system of claim 16 wherein the analyzer combines the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process, (“Learning techniques are selected for each of the plurality of datasets by a modeling engine, step 106. Choices include, but are not limited to support vector machines (SVMs), tree-based techniques, artificial neural networks, random forests and other supervised or unsupervised learning algorithms.” (fig. 1; col.6 ln.37-42); “Overall weighting of each model within each dataset may be determined. Each model set (SVMs with predictive power 420, 422, 424, 426, and 428) are transmitted to a prediction server/engine along with the weights of each model within each dataset and the number of examples in each feature set to form overall ensemble 440. Voting weights 430, 432, 434, 436, and 438 can be assigned to SVMs with predictive power 420, 422, 424, 426, and 428, respectively. The voting weights may be scaled to amount of data input into the model building (the number of examples used in a model). Relative weights of each of the sets of models may be determined based on the number of examples provided from the training data for each of the datasets.” (fig.4; col.13 ln.55 – col.14 ln.1)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., assigning weights and forming overall ensemble) as disclosed in McMahon to combine the RNN process and the Markov chain process. The combination would have been obvious because a person of ordinary skill in the art would combine the RNN process and the Markov chain process by assigning respective coefficient weights to each of the distances for the RNN process and the Markov chain process.
Heimann further teaches receives user feedback of the respective score for each log event, (“element 308 on fig.3”; "Computerized Adaptive Detection (CAD) may comprise the combination of these two powerful components. Janus, based on feedback from the user (i.e., Oracle), may generate a loss function and automatically choose observations within the Analysis region of TailJumps to have the Oracle label with level of interest. This interaction may terminate when the loss function is minimized subject to constraints. Janus may minimize the loss function to find the optimal cutoff. FIG. 3 is an overview of the platform 300, as executed by at least one platform server 100, according to an embodiment of the invention. Unsupervised learning core 302 may use network-based behavioral analytics and/or user-based entity behavioral analytics to score observations and produce score events. Outlier detection algorithm(s) 304 may produce a filtered set of outliers. For example, filtration and output are illustrated in detail below in FIGS. 17-20 and accompanying description. AI engine 306 may produce machine curated results using the filtered set of outliers as input. For example, yellow dots indicate steps in a self-learning process, and blue dots indicate steps in an active learning process, explained in greater detail with respect to FIG. 21 and accompanying description. Example outputs of these processes are shown in FIG. 25 and accompanying description. Human curation interface 308 may allow users to feed knowledge back to AI engine 306 in the form of labels." (col.5 ln.57 – col.6 ln.15); 
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “Human curation interface 308 may allow users to feed knowledge back to AI engine 306 in the form of labels”, Heimann). 
King, on the other hand, teaches wherein the coefficient weights are determined based on the user feedback using a simple machine learning model, (“some embodiments may train other types of models, for instance, a Hidden Markov model or a recurrent neural net.” [¶0117]; “Embodiments may ask users to rate their workout instructor and associate those ratings with the user's profiles. This data may be used as a training set. Embodiments may then determine weights or coefficients of a model, for instance a neural net or decision tree, by iteratively adjusting the coefficients, determining how closely the model describes the training set, and then adjusting the weights or coefficients in a direction in which the correspondence is expected to increase. The resulting model may then receive as an input a current user's profile and, based on the trained weights and coefficients, a score or selecting of candidate workout instructors may be determined to identify the best fit for the user based on the training data. [¶0118]).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “ask users to rate”, King). The combination would have been obvious because a person of ordinary skill in the art would determine weights or coefficient as disclosed in King.
Das, on the other hand, teaches wherein the score comprises a numeric ranking within a defined range, (“The anomaly score assigned to point xi is the mean
negative log density” (p. 854, right col. para.1); "Internally, our model maintains a list of data instances ranked by the anomaly score produced by the anomaly detector." (p. 854, right col. para.4); "We want the scores of all labeled anomalies to be higher than qτ and the scores of all labeled nominals to be below qτ." (page. 854, right col. para.8).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., “ranked by the anomaly scores”, Das). The combination would have been obvious because a person of ordinary skill in the art would rank the anomaly scores.
Zimek, on the other hand, teaches calculates an event score for each event by summing a weighted RNN score for an event with a weighted Markov chain score for the event. ("Combining outlier scores or rankings learned on different subsets of attributes, the so-called “feature bagging” was the first paper to explicitly discuss building ensembles for outlier detection [45]" (p.15, left col. para.3); "Combining outlier scores of different algorithms (i.e., combinations of different models of what constitutes an outlier) has been explored in several studies [54; 40; 63]." (p.15, right col. para.5)).
Therefore, it would have been obvious to a person of ordinary skill in the art, before the effective filing date of the claimed invention, to apply a known technique (i.e., "Combining outlier scores of different algorithms”, Zimek). The combination would have been obvious because a person of ordinary skill in the art would combine scores from two different models.

Regarding claim 18, Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das/ Zimek teach claim 17.
Singh further teaches the system of claim 17 wherein the chart is labeled with an indexed label identifying each of the events and the anomaly in a contrasting visual manner, (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).
the indexed label comprising an alphanumeric character superimposed proximate the events and anomaly, (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1); “Figure 4.3: Results of HTM on Machine Temperature Data. (a) shows the results on entire dataset. (b) shows the results corresponding to figure 4.1. (c) shows the results corresponding to figure 4.2. The X-axis shows time and the Y-axis shows the temperature. True anomalies are denoted by red markers. Green/red diamonds represent true/false positives. The pink shaded portions are anomaly windows. The purple shaded area in (a) is the testing window.” (p.32 para.1)). and 
wherein the chart comprises an interactive chart wherein each indexed label provides an interface providing to information about each event, the information including description, data source, and time of event. (“Figure. 4.1: Validation set results on machine temperature dataset. The top plot shows the predictions done on VA and the corresponding prediction errors. The bottom plot shows the log PD values of the prediction errors and the threshold set at −11. The X-axis shows time steps and the Y-axis has the corresponding metric value. There are two true anomalies highlighted by red markers. Shaded area in the top graph denotes detections made by the LSTM algorithm.” (p.30 para.1)).

Regarding claim 19, Singh/ Muddu / Heimann/ Husain/ Gopalakrishnan/ Veeramachaneni/ McMahon/ King/ Das/ Zimek teach claim 18.
Singh further teaches the system of claim 18 wherein the RNN comprises a long short-term memory (LSTM) RNN network, ("We use LSTM RNN as the time series prediction model." (p.20 para.2)). and 
wherein the data collector comprises one of an agent process embedded in each device of the network, or automatic status transmitting mechanisms native to each device. ("This dataset is taken from [39] and is available at Numenta’s GitHub repository3. The dataset contains temperature sensor readings of an internal component of a large industrial machine." (p.24 para.3)).



Response to Arguments
Applicant’s arguments with respect to the drawing objection and the specification objection have been fully considered, and after a review of them, are found to be persuasive. The drawing objection and the specification objection of the previous office action are withdrawn.

Regarding Claims 5 and 16 under 35 U.S.C. §112(b):
In response to applicant’s arguments regarding amended claims 5 and 16 regarding the 35 U.S.C. § 112(b) rejection has been considered and are persuasive. Therefore, the rejection has been withdrawn.

Regarding the rejection of claims 1-20 under 35 U.S.C. §103:

Applicant’s arguments with respect to claims 1, 2, 5, 10, 13, 14, 16, and 20 have been considered but are moot because the new ground of rejection does not rely on any reference applied in the prior rejection of record for any teaching or matter specifically challenged in the argument. 
Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims. 



Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Deuk Lee whose telephone number is 571-272-8440.  The examiner can normally be reached on Monday-Friday 8:30am-5:30pm CDT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/DL/
Examiner, Art Unit 2122  /KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122