DETAILED ACTION
Claims 1 and 14 have been amended
Objection to the specification has been withdrawn based on the filed corrections. 
Claims 1-20 are pending
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant’s arguments filed on 06/29/2022 have been fully considered. 
With respect to the arguments of claim 8, examiner respectfully disagrees. 
Applicant has stated that SHEN fails to teach “program instructions to fit a trained deep neural network with a predictive generator head; program instructions to predict future system and network activity events using the trained deep neural network fitted with the predictive generator head”. Examiner respectfully disagrees, as cited for the rejection of claim 8, SHEN Col. 4 lines 3 -13 does teach of fitting a deep neural network with a predictive generator head. SHEN teaches of training the neural network from event sequences which in term enables the neural network to understand both long-term and short-term interactions between events. All of this enables the neural network to forecast potential attacks as cited in the rejection of claim 8. Col. 2 lines 55-65. Furthermore, para. 0079 of the current invention’s specification states a generator head “uses the deep neural network to predict future system and network activity events” and SHEN teaches of a neural network that is trained by events to predicts future network activity. Therefore, the cited paragraphs of SHEN teach the limitation.

With respect to the arguments of claim 5 and 18, examiner respectfully disagrees.
Applicant has stated that neither MARTIN nor MUDDU teach dropping a portion of the system and network activity, and MARTIN and MUDDU do not teach of predicting a portion of the network activity that was dropped. Examiner respectfully disagrees because MARTIN teaches of “dropping” as stated in the rejection of claim 5, MARTIN para. 0077 teaches of selecting only portion of historical data from the corpus to train a neural network. Furthermore, MUDDU also teaches of selecting a portion of historical data by selecting events that took place within a certain time window. Examiner is mapping the selection of a time window or a particular portion of historical data from the corpus in MUDDU and MARTIN as being interpreted as dropping the activity outside the window. MUDDU further teaches training a model on these historical events and then determining if the prediction can be considered trustworthy enough based on the set of historical symbols. MUDDU describes of using the PST model to predict what the next symbol may be, and MUDDU is determining the trustworthiness of the model based on the historical symbols. Therefore, MARTIN-MUDDU teach the limitations of claim 5. 
Additional arguments are moot in view of the new grounds of rejection necessitated by the amendments. 

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.




Claims 1-3  are rejected under 35 U.S.C. 103 as being unpatentable over MARTIN (US-20180004948-A1) in view of SBANDI (US-20210021636-A1), and further in view of FLOR, hereinafter MARTIN-SBANDI-FLOR.
Regarding claim 1, MARTIN teaches “A method comprising: collecting, by a computing device, system and network activity events in bulk; forming, by the computing device, a corpus using the collected system and network activity events; ([MARTIN ,Paragraph 0077] “In this implementation, the system can train the replicator neural network (or other outlier detection model) on a corpus of all historical vectors generated for assets on the network, such as for all time or within an extended time window (e.g., two years) for all asset types or for a particular asset type of the first asset (e.g., a personal computer, a tablet, a smartphone, a server, a switch, a printer, etc.). For example, the corpus of historical vectors can include both: a first set of historical vectors labeled as either malicious (e.g., labeled with identifiers of known network security threats) or benign following investigation by security analysts; and a second set of unlabeled historical vectors. The corpus of historical vectors can also include a set of vectors generated outside of the network, such as: vectors generated and/or labeled by an ISAC upon detection or characterization of a new security threat or cyber attack; and labeled and/or unlabeled vectors generated at other networks; etc. …… as a corpus of vectors generated for assets within the network increases over time, the system can transition the corpus of historical vectors to include predominantly or exclusively intra-network vectors. However, the system can aggregate labeled and/or unlabeled historical vectors from any other internal or external source. The system can then train the replicator neural network (or other outlier detection model) on this corpus of historical vectors.”) correlating, by the computing device, discrete events of the system and network activity events into offenses; ([MARTIN, Paragraph 00541] “In particular, the system can compare the new vector directly to a set of historical vectors representing confirmed cyber attacks on the network and/or external networks (hereinafter “malicious vectors”) in Block S270 and output an alert to investigate the asset or the network generally for a particular cyber attack in Block S272 if the new vector matches a particular malicious vector—in the set of known malicious vectors—tagged with the particular cyber attack.”) ([MARTIN, Paragraph 0064] “In particular, the system can match the set of flash exploit, multiple attempted remote access, uncommon connection, and repeated failed login behaviors represented in the first vector directly to a particular malicious vector—in a set of known malicious vectors—representing a particular known cyber attack in Block S270 and then issue an alert prompting investigation into the first computer or into the network generally for presence of the particular cyber attack in Block S272.”) adding, by the computing device, additional features to the corpus representing the offenses and disposition decisions regarding the offenses; training, by the computing device, a deep neural network using the corpus; ([MARTIN, Paragraph 0082] “Furthermore, the system can add the new vector to the corpus of historical vectors and/or to the subset of labeled historical vectors in Block S280, as shown in FIGS. 3, 4, and 5. For example, the system can access a result of an investigation into the asset responsive to the alert generated in Block S240, such as from an investigation database. The system can then label the new vector according to the result, such as by labeling the new vector as malicious (e.g., representing a security threat) or by labeling the new vector as benign according to the result of the investigation. The system can then insert the new vector into the corpus of historical vectors and retrain the replicator neural network (or other outlier detection model) on this extended corpus of historical vectors,”).
However, MARTIN does not teach “and refining, by the computing device, the deep neural network for a monitored computing environment using transfer learning, wherein refining the deep neural network comprises: freezing a backbone of the deep neural network; and training top layers of the deep neural network”.
In analogous teaching, SBANDI teaches “and refining, by the computing device, the deep neural network for a monitored computing environment using transfer learning” ([SBANDI, Paragraph 0041] “During operation, neural network classifications can be confirmed or denied (e.g., by an expert user, expert system, reference database, etc.) to continue to improve neural network behavior. The example neural network is then in a state of transfer learning, as parameters for classification that determine neural network behavior are updated based on ongoing interactions. In certain examples, the neural network can provide direct feedback to another process. In certain examples, the neural network outputs data that is buffered (e.g., via the cloud, etc.) and validated before it is provided to another process.”).
Thus, given the teaching of SBANDI, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using transfer learning to tune a neural network as taught by SBANDI into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN. One of ordinary skill in the art would have been motivated to do so because SBANDI recognizes the need to improve cybersecurity in real-time ([SBANDI, Paragraph 0002] “Undetected cyberattacks are even more concerning. As the digital economy continues to develop, cybersecurity has become a formidable task in the IoT era.”) ([SBANDI, Paragraph 0005] “This disclosure addresses one or more of the shortcomings in the industry, thus improving the automated, real-time, multi-dimensional cybersecurity threat modeling in an organization.”).
However, MARTIN-SBANDI does not teach of “wherein refining the deep neural network comprises: freezing a backbone of the deep neural network; and training top layers of the deep neural network.”
In analogous teaching, FLOR teaches “wherein refining the deep neural network comprises: freezing a backbone of the deep neural network; and training top layers of the deep neural network.” ([FLOR, Col. 10 lines 30-40] “An HTM model becomes more efficient at learning as it is exposed to more input data. It is able to adapt in place if the input data change and thus is more resistant to noisy input data than a neural network trained using supervised learning. Since each HTM layer learns a respective model of its input, it is possible to freeze online learning for some or all of the HTM layers to match characteristics of the input data (e.g., freeze online learning for the lower layers in cases where the initial input is constant while allowing the upper layers to continue to adapt to new sequences of input received from the lower layers).”) ([FLOR, Col. 9 lines 14-19] “An HTM is considered to be a type of neural network, because its structure models the structure of the brain neocortex. Each layer of an HTM is considered to represent one layer of neurons in a neocortical region, and is a structure composed of interconnected columns of nodes (cells hereinafter).”) ([FLOR, Col. 12 lines 19-22] “In some embodiments, the input datastream may be input repeatedly to the HTM, since, as previously described with reference to FIG. 3, learning efficiency of an HTM improves with experience.”). ([FLOR, Col. 4 lines 46-51] “An HTM is capable of online learning, i.e., learning by continually modifying its stored sequences of patterns based on each new input it receives. Generating an HTM model via online learning can be implemented by exposure of the HTM to sequences of inputs that include historical datastreams, live datastreams, or a combination”)
Thus, given the teaching of FLOR, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of freezing the backbone and training the top layers of a neural network as taught by FLOR into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI. One of ordinary skill in the art would have been motivated to do so because FLOR recognizes the benefits of using HTM model as it is more efficient at learning ([FLOR, ] “An HTM model becomes more efficient at learning as it is exposed to more input data. It is able to adapt in place if the input data change and thus is more resistant to noisy input data than a neural network trained using supervised learning.”).

Regarding claim 2, MARTIN-SBANDI-FLOR teach all limitation of claim 1. MARTIN further teaches “wherein the system and network activity events are collected from the monitored computing environment.” ([Paragraph 0067] “Block S210 of the second method S200 recites receiving a first signal specifying a first behavior of a first asset on a network at a first time. Generally, in Block S210, the system interfaces with one or more sensors on the network and/or intrusion detection systems executing on machines on the network to collect signals representing behaviors (e.g., events, actions) uncommon to these assets and/or representing behaviors risky to the asset and/or to the network.”).

Regarding claim 3, MARTIN-SBANDI-FLOR teach all limitation of claim 1. MARTIN further teaches “further comprising prioritizing, by the computing device, the offenses and adding metadata to the corpus regarding the prioritized offenses.” ([MARTIN, Paragraph 0077] “The corpus of historical vectors can also include a set of vectors generated outside of the network, such as: vectors generated and/or labeled by an ISAC upon detection or characterization of a new security threat or cyber attack; and labeled and/or unlabeled vectors generated at other networks; etc. For example, when the network is first on-boarded onto the system, the system can compare a new vector to labeled and unlabeled historical vectors generated at other networks—such as associated with entities operating in a similar capacity or in a similar market sector as the entity operating the network—in order to enable selective triggering of alerts despite limited historical data for the network; as a corpus of vectors generated for assets within the network increases over time, the system can transition the corpus of historical vectors to include predominantly or exclusively intra-network vectors.”) ([MARTIN, Paragraph 0082] “The system can then label the new vector according to the result, such as by labeling the new vector as malicious (e.g., representing a security threat) or by labeling the new vector as benign according to the result of the investigation. The system can then insert the new vector into the corpus of historical vectors”) ([MARTIN, Paragraph 0015] “an “attribute” refers to a value descriptive of a signal, such as values contained in signal metadata. For example, an external detection mechanism or the system can store: a signal type, vulnerability type, or attempted exploitation mechanism; a timestamp corresponding to generation of a signal; an asset identification tag (e.g., IP address and user ID, host name, MAC address) corresponding to an asset at which the behavior that triggered the signal originated ……. Each signal can thus include metadata defining various parameters of one aspect or “stage” of a possible cyber attack on the network, and the system can compare signal metadata across multiple signals to relate these signals in Block S130.”).

Claim 4  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR, in view of BAKER (US-11037059-B2), hereinafter MARTIN-SBANDI-FLOR-BAKER.
Regarding claim 4,  MARTIN-SBANDI-FLOR teach all limitation of claim 1. However, MARTIN-SBANDI-FLOR  does not teach “wherein the training the deep neural network comprises using self-supervision”.
 In analogous teaching, BAKER teaches “wherein the training the deep neural network comprises using self-supervision” ([BAKER, Col. 2 lines 29-33] “FIG. 1 is a flowchart of a process in which a computer system, such as the computer system 300 illustrated in FIG. 3, uses self-supervised back propagation in situations in which labeled training data is not available, such as during operation or during unsupervised training.”) ([BAKER, Col. 2 lines 41-47] “At step 100, the computer system 300 obtains or trains a machine learning system. In one aspect, the obtained or trained machine learning system is a neural network, such as the example neural network shown in FIG. 4 or the neural network 150 shown in FIG. 2. A neural network comprises a set of nodes and directed arcs, typically arranged into layers”).
Thus, given the teaching of BAKER, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using self-supervision to help train a neural network as taught by BAKER into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI-FLOR. One of ordinary skill in the art would have been motivated to do so because BAKER recognizes the need to improve machine learning systems in particular neural networks. ([BAKER, Col. 10 lines 57-63] “Based on the above description, it is clear that aspects of the present invention can be used to improve many different types of machine learning systems, particularly neural networks. For example, aspects of the present invention can improve recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems, to name but a few examples.”)

Claim 5  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-BAKER, in view of MUDDU (US-20190109868-A1).
Regarding claim 5, MARTIN-SBANDI-FLOR-BAKER teach all limitation of claim 4. MARTIN further teaches “further comprising: dropping, by the computing device, a portion of the system and network activity events from the corpus;” ([MARTIN, Paragraph 0077] “In this implementation, the system can train the replicator neural network (or other outlier detection model) on a corpus of all historical vectors generated for assets on the network, such as for all time or within an extended time window (e.g., two years) for all asset types or for a particular asset type of the first asset (e.g., a personal computer, a tablet, a smartphone, a server, a switch, a printer, etc.).”). 
However, MARTIN does not teach “predicting, by the computing device, the portion of the system and network activity events that was dropped; and determining, by the computing device, an accuracy of the portion of the system and network activity events that was predicted.”.
In analogous teaching, MUDDU teaches “predicting, by the computing device, the portion of the system and network activity events that was dropped; and determining, by the computing device, an accuracy of the portion of the system and network activity events that was predicted” ([MUDDU, Paragraph 0521] “More specifically, the PST model is to be used in a way that, given an observation window with a number of previous symbols, the PST model can predict what the next symbol may be, to identify whether a target window is anomalous (e.g., by having an anomaly count beyond a baseline). Before the PST model is ready to do so, the PST model needs to receive training so that it can more accurately anticipate or predict the next symbol. For example, the PST model can be trained by a certain set of historical symbols. This set of historical symbols (i.e., the amount of training) denotes whether the PST model is considered ready (i.e., the prediction can be considered enough trustworthy). The amount of training can be controlled based on any of various training principles including, for example, by a fixed time, by a fixed number of symbols, or by other suitable methods including automatic training. The fixed time type of training can include training the PST model by using all previous symbols that took place within a certain time window (e.g., one week). The fixed symbol number type of training can include training the PST model by using a select number of previous symbols (e.g., 5,000 events). An example of an automatic training can include training the PST model by using past symbols until the PST model meets a certain criterion, such as convergence.”).
Thus, given the teaching of MUDDU, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of determining the accuracy of the prediction as taught by MUDDU into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI-FLOR-BAKER. One of ordinary skill in the art would have been motivated to do so because MUDDU recognizes the need to quickly detect anomalous activity within a network. ([MUDDU, Paragraph 0137] “Introduced here, therefore, is a data processing and analytics system (and, as a particular example, a security platform) that employs a variety of techniques and mechanisms for anomalous activity detection in a networked environment in ways that are more insightful and scalable than the conventional techniques. As is described in more detail below, the security platform is “big data” driven and employs a number of machine learning mechanisms to perform security analytics.”).

Claim 6  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR, in view of VASSEUR (US-10187413-B2), hereinafter MARTIN-SBANDI-FLOR-VASSEUR.
Regarding claim 6, MARTIN-SBANDI-FLOR teach all limitation of claim 1. However, MARTIN-SBANDI-FLOR does not teach “wherein the training the deep neural network comprises using unsupervised learning including dimensionality reduction.”.
In analogous teaching VASSEUR teaches “wherein the training the deep neural network comprises using unsupervised learning including dimensionality reduction” ([VASSEUR, Col. 12 lines 60-62] “In the case of anomaly detection, two classes of machine learning may be used, namely unsupervised and supervised machine learning.”) ” ([VASSEUR, Col. 8 lines 46-57] “Replicator techniques may also be used for purposes of anomaly detection. Such techniques generally attempt to replicate an input in an unsupervised manner by projecting the data into a smaller space (e.g., compressing the space, thus performing some dimensionality reduction) and then reconstructing the original input, with the objective of keeping the “normal” pattern in the low dimensional space. Example techniques that fall into this category include principal component analysis (PCA) (e.g., for linear models), multi-layer perceptron (MLP) ANNs (e.g., for non-linear models), and replicating reservoir networks (e.g., for non-linear models, typically for time series).”).
Thus, given the teaching of VASSEUR, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using unsupervised learning with dimensionality reduction as taught by VASSEUR into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI-FLOR. One of ordinary skill in the art would have been motivated to do so because VASSEUR recognizes the need to improve the performance of traffic classifiers to detect anomalies. ([VASSEUR, Col. 18 lines 41-47] “The techniques described herein, therefore, provide for a network-based approach for training supervised learning classifiers. In particular, the techniques herein greatly improve the performance of traffic classifiers by linking an anomaly detection SLN with other security devices, allowing for the dynamic training of a classifier using a variety of traffic samples.”).

Claim 7 is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-VASSEUR, in view of KIM (US-20200394526-A1).
Regarding claim 7, MARTIN-SBANDI-FLOR-VASSEUR teaches all limitations of claim 6. However, MARTIN-SBANDI-FLOR-VASSEUR does not teach “further comprising the computing device using an autoencoder head to train the deep neural network”.
In analogous teaching KIM teaches “further comprising the computing device using an autoencoder head to train the deep neural network.” ([KIM, Paragraph 0034] “The neural network 20 may include an autoencoder that configures layers of an encoder and a decoder based on the input layer, the (at least one) hidden layer, and the output layer. For example, in the autoencoder, the encoder is sometimes called a recognition network that converts (encodes) input features into an internal representation, and the decoder is sometimes called a generative network that converts (decodes) the internal representation into output features.”) ([KIM, Paragraph 0037] “In some example embodiments, the anomaly detection system 10 in FIG. 1 may be configured to repeatedly train the autoencoder (that is, the neural network 20) by using a signal pattern extracted from the input data signal, thereby repeatedly updating parameters of each layer to allow the autoencoder to classify a signal pattern that may be recognized as a normal signal pattern.”).
Thus, given the teaching of KIM, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using an autoencoder to train a neural network as taught by KIM into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI-FLOR- VASSEUR. One of ordinary skill in the art would have been motivated to do so because KIM recognizes the importance of using neural networks to detect network anomalies. ([KIM, Paragraph 0003] “In particular, in various technical fields such as cyber-intrusion detection, sensor networks anomaly detection, medical anomaly detection, and industrial damage detection, in order to prevent accidents through anomaly detection for recognizing and determining in realtime a situation in which an abnormal signal is generated during activities in which a large number of continuous normal signals are generated, techniques for more efficient anomaly detection using neural network systems have been developed.”).

Claims 8, 9, 11, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over SHEN (US-11108787-B1), in view of VASSEUR (US-10187413-B2), hereinafter SHEN-VASSEUR.
Regarding claim 8, SHEN teaches “A computer program product comprising: one or more computer readable storage media, and program instructions collectively stored on the one or more computer readable storage media, the program instructions comprising: ([SHEN, Col. 3 lines 28-38] “In some embodiments, each of the network devices 104 a-104 n may be any computer system capable of communicating over the network 102, examples of which are disclosed herein in connection with the computer system 400 of FIG. 4. In some embodiments, the network devices 104 a-104 n may include security applications 114 a-114 n, respectively. Similarly, in some embodiments, the security server 106 may be any computer system capable of communicating over the network 102 and capable of monitoring the network devices 104 a-104 n, examples of which are disclosed herein in connection with the computer system 400 of FIG. 4.”) program instructions to fit a trained deep neural network with a predictive generator head; program instructions to predict future system and network activity events using the trained deep neural network fitted with the predictive generator head; ([SHEN, Col. 4 lines 3 – 13] “the security application 114 a and/or the security application 116 may generate training sequences, validation sequences, and test sequences from the event sequences and train the recurrent neural network 250 using the training sequences, the validation sequences, and the test sequences. This training may enable the recurrent neural network 250 to understand both long-term and short-term interaction between events, and the effects between benign noise events and malicious attack events.”) ([SHEN, Col. 2 lines 55-65] “The embodiments disclosed herein may enable the securing of a network device by forecasting an attack event using a recurrent neural network. In some embodiments, securing a target network device may include training a recurrent neural network, collecting an event sequence of the most recent events that occurred on a target network device, using the recurrent neural network to forecast the next event that will occur on the target network device, and in response to the forecasted next event being an attack event, performing a security action to prevent harm to the target network device from the attack event.”).
However, SHEN does not teach “program instructions to fit the trained deep neural network with a classifier head; and program instructions to classify the predicted future system and network activity events using the trained deep neural network fitted with the classifier head.”.
In analogous teaching VASSEUR teaches “program instructions to fit the trained deep neural network with a classifier head;” ([VASSEUR, Col. 18 lines 19-26] “Notably, the supervisory device may merge the traffic data and their labels (e.g., ‘normal’ traffic from the DLAs, ‘normal’ and ‘suspicious/attack/etc.’ traffic data from the security device) into a training dataset for a machine learning-based classifier capable of classifying further network traffic. For example, the supervisory device may train a deep neural network using the training data, to classify further network traffic.”) and program instructions to classify the predicted future system and network activity events using the trained deep neural network fitted with the classifier head. ([VASSEUR, Col. 8 lines 14-17] “For example, a learning machine may dynamically make future predictions based on current or prior network measurements, may make control decisions based on the effects of prior control commands, etc.”) ([VASSEUR, Col. 17 lines 48-52] “For example, SCA 502 may push the trained classifier to a selected DLA 400 a via a Classifier( ) message 720. In turn, DLA 400 a may use this classifier to detect anomalies directly and/or enhance the features of its existing anomaly detection process with the output of this classifier.”).
Thus, given the teaching of VASSEUR, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using a classifier to classify network activity as taught by VASSEUR into the teaching of a neural network to make future network activity predictions as taught by SHEN. One of ordinary skill in the art would have been motivated to do so because VASSEUR recognizes the need to improve the performance of traffic classifiers to detect anomalies. ([VASSEUR, Col. 18 lines 41-47] “The techniques described herein, therefore, provide for a network-based approach for training supervised learning classifiers. In particular, the techniques herein greatly improve the performance of traffic classifiers by linking an anomaly detection SLN with other security devices, allowing for the dynamic training of a classifier using a variety of traffic samples.”).

Regarding claim 9, SHEN-VASSEUR teach all limitations of claim 8. VASSEUR further teaches “program instructions to collect real-time system and network activity events;” ([VASSEUR, Col. 16 lines 38-53] “In a first instantiation, NTS 604 may collect traffic from each of the anomaly detection sensors/DLAs 400 a-400 n under its management. In particular, as shown in FIG. 7D, NTS 604 of SCA 502 may send a Traffic-Sample-Request( ) message 712 to each of the sensors/DLAs 400 a-400 n, to collect traffic samples at different times of the day and the week. It is the responsibility of NTS 604 to schedule sample collections (e.g., from DLAs 400 a-400 n and/or from security devices 510), so as to keep the bandwidth consumption under a certain threshold. In another embodiment, NTS 604 may retrieve a description of the network topology from a policy server, which will allow for optimization of the data collection so as not to saturate common network links (e.g. NTS 604 can choose sensors in different areas of the network so that their path towards SCA 502 are as disjoint as possible).”) and program instructions to classify the collected real-time system and network activity events using the trained deep neural network fitted with the classifier head. ([VASSEUR, Col. 17 lines 40 -52] “The type of classifier builds by CTS 602 may vary, but Deep Neural Networks (DNN) are excellent candidates for this type of applications, as they require a lot of training data and computational resources (both of which are available to CTS 602 since it is located in the datacenter). While DNNs are expensive to train, they are cheap to evaluate. Hence, CTS 602 may periodically push the trained classifier to the edge, along with statistics about its accuracy for various signatures. For example, SCA 502 may push the trained classifier to a selected DLA 400 a via a Classifier( ) message 720. In turn, DLA 400 a may use this classifier to detect anomalies directly and/or enhance the features of its existing anomaly detection process with the output of this classifier.”).
The same motivation to modify SHEN with VASSEUR as in the rejection of claim 8 applies. 

Regarding claim 11, SHEN-VASSEUR teach all limitations of claim 9. VASSEUR further teaches “wherein the real-time system and network activity events are collected from a monitored computing environment.” ([VASSEUR, Col. 16 lines 38-53] “In a first instantiation, NTS 604 may collect traffic from each of the anomaly detection sensors/DLAs 400 a-400 n under its management. In particular, as shown in FIG. 7D, NTS 604 of SCA 502 may send a Traffic-Sample-Request( ) message 712 to each of the sensors/DLAs 400 a-400 n, to collect traffic samples at different times of the day and the week. It is the responsibility of NTS 604 to schedule sample collections (e.g., from DLAs 400 a-400 n and/or from security devices 510), so as to keep the bandwidth consumption under a certain threshold. In another embodiment, NTS 604 may retrieve a description of the network topology from a policy server, which will allow for optimization of the data collection so as not to saturate common network links (e.g. NTS 604 can choose sensors in different areas of the network so that their path towards SCA 502 are as disjoint as possible).”).
The same motivation to modify SHEN with VASSEUR as in the rejection of claim 8 applies. 

Regarding claim 13, SHEN-VASSEUR teach all limitations of claim 8. SHEN teaches “wherein the predictive generator head is trained ([SHEN, Col. 2 lines 57 - 62] “In some embodiments, securing a target network device may include training a recurrent neural network, collecting an event sequence of the most recent events that occurred on a target network device, using the recurrent neural network to forecast the next event that will occur on the target network device”).
However, SHEN-VASSEUR does not teach “training using unsupervised learning”.
In analogous teaching. VASSEUR teaches “trained using unsupervised learning” ([VASSEUR, Col. 12 lines 60 – 62] “In the case of anomaly detection, two classes of machine learning may be used, namely unsupervised and supervised machine learning.”) ([VASSEUR, Col. 8 lines 46 – 48] “Replicator techniques may also be used for purposes of anomaly detection. Such technique generally attempt to replicate and input in an unsupervised manner”)
The same motivation to modify SHEN with VASSEUR as in the rejection of claim 8 applies. 


Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over SHEN-VASSEUR in view of NEZNAL (US-20200204571-A1).
Regarding claim 10, SHEN-VASSEUR teach all limitations of claim 9. However, SHEN-VASSEUR does not teach “further comprising program instructions to refine the predictive generator head based on differences between the collected real-time system and network activity events and the predicted future system and network activity events.”. 
In analogous teaching NEZNAL teaches “further comprising program instructions to refine the predictive generator head based on differences between the collected real-time system and network activity events and the predicted future system and network activity events.” ([NEZNAL, Paragraph 0031] “The recurrent neural network process the data in a manner that uses both prior state data and current state data to predict the next data likely to be observed on the network, and in training compares the actual next data with the predicted next data and adjusts the network parameters based on the difference between actual and predicted next data (or the loss) to learn to more accurately predict the next network data. As this learning process is repeated over large volumes of training data, the recurrent neural network learns to more accurately predict the next network data from a sequence of network data.”) ([NEZNAL, Paragraph 0045] “The difference or loss function is fed back into the recurrent neural network, such as through backpropagation or other such methods, and used to alter the neural network coefficients to cause the predicted next element to more closely match the actual or observed next element in the time series, thereby training the neural network to more accurately predict the next element or elements.”).
Thus, given the teaching of NEZNAL, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of refining the predicted network and system activity as taught by NEZNAL into the teaching of a neural network to make future network activity predictions as taught by SHEN-VASSEUR. One of ordinary skill in the art would have been motivated to do so because NEZNAL recognizes the need to enhance network security through the use of neural networks. ([NEZNAL, Paragraph 0025] “Some examples described herein therefore seek to improve network security by monitoring network traffic using a long short term memory (LSTM) model such as a recurrent neural network or convolutional neural network to monitor and characterize normal traffic, enabling the neural network to detect traffic patterns that are abnormal.”).

Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over SHEN-VASSEUR in view of BAKER (US-11037059-B2).
Regarding claim 12, SHEN-VASSEUR teach all limitations of claim 8. SHEN teaches “wherein the predictive generator head is trained” ([SHEN, Col. 2 lines 57 - 62] “In some embodiments, securing a target network device may include training a recurrent neural network, collecting an event sequence of the most recent events that occurred on a target network device, using the recurrent neural network to forecast the next event that will occur on the target network device”).
However, SHEN-VASSEUR does not teach “training using self-supervision”.
In analogous teaching, BAKER teaches “wherein the training the deep neural network comprises using self-supervision” ([BAKER, Col. 2 lines 29-33] “FIG. 1 is a flowchart of a process in which a computer system, such as the computer system 300 illustrated in FIG. 3, uses self-supervised back propagation in situations in which labeled training data is not available, such as during operation or during unsupervised training.”) ([BAKER, Col. 2 lines 41-47] “At step 100, the computer system 300 obtains or trains a machine learning system. In one aspect, the obtained or trained machine learning system is a neural network, such as the example neural network shown in FIG. 4 or the neural network 150 shown in FIG. 2. A neural network comprises a set of nodes and directed arcs, typically arranged into layers”).
Thus, given the teaching of BAKER, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using self-supervision to help train a neural network as taught by BAKER into the teaching of a neural network to make future network activity predictions as taught by SHEN-VASSEUR. One of ordinary skill in the art would have been motivated to do so because BAKER recognizes the need to improve machine learning systems in particular neural networks. ([BAKER, Col. 10 lines 57-63] “Based on the above description, it is clear that aspects of the present invention can be used to improve many different types of machine learning systems, particularly neural networks. For example, aspects of the present invention can improve recommender systems, speech recognition systems, and classification systems, including image and diagnostic classification systems, to name but a few examples.”).

Claims 14-16  are rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR, in view of GRUBE (US-20190012234-A1), hereinafter MARTIN-SBANDI-FLOR-GRUBE.
Regarding claim 14, claim 14 is a system claim that recites features similar to those recited in method claim 1. Therefore, claim 14 is rejected in a similar manner as in the rejection of claim 1. 
However, MARTIN-SBANDI-FLOR does not teach “wherein the computing device is a dispersed storage (DS) processing unit”.
In analogous teaching. GRUBE teaches “wherein the computing device is a dispersed storage (DS) processing unit”. ([GRUBE, Paragraph 0314] “FIG. 47B is a schematic block diagram of an example of a dispersed storage network that includes a dispersed storage (DS) processing unit 562 and a set of DS units 564. Alternatively, the DS processing unit 562 may include a distribute storage and task (DST) processing unit and each DS unit 564 may include a DST execution unit. The network functions to ingest large amounts of data 1-3 for storage in the set of DS units 564 …… The network utilizes a centralized data ingestion approach by utilizing the DS processing unit 562 to ingest large amounts of data, enables execution of partial tasks by the DS units 564 is that are associated with storing the data, and enables improved storage reliability via utilization of the DS units 564 that are associated with storing redundancy data.”).
Thus, given the teaching of GRUBE, it would have been obvious to one of ordinary skill in the art before the effective filling date of the claimed invention to combine the teaching of using a dispersed storage (DS) processing unit as taught by GRUBE into the teaching of a corpus of collected network and system activity to train a neural network to detect offenses as taught by MARTIN-SBANDI-FLOR. One of ordinary skill in the art would have been motivated to do so because GRUBE recognizes the benefits of using dispersed storage (DS) processing unit to improve the efficiency of storing data. ([GRUBE, Paragraph 0341] “The set of DS units 654 enables execution of partial tasks by the ingesting DS unit 656 on the data blocks of data, enables improved storage reliability via utilization of the redundancy DS units 658 that are associated with storing the redundancy data, and enables improved storage efficiency by identifying and remedying stores data blocks of the stored data blocks 662 that are substantially similar.”).

Regarding claim 15, MARTIN-SBANDI-FLOR-GRUBE teach all limitations of claim 14. Furthermore, this claim recites features similar to those in claim 2. Therefore, claim 15 is rejected in a similar manner as in the rejection of claim 2. 

Regarding claim 16, MARTIN-SBANDI-FLOR-GRUBE teach all limitations of claim 14. Furthermore, this claim recites features similar to those in claim 3. Therefore, claim 16 is rejected in a similar manner as in the rejection of claim 3. 


Claim 17  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-GRUBE, in view of BAKER (US-11037059-B2), hereinafter MARTIN-SBANDI-FLOR-GRUBE-BAKER.

Regarding claim 17, MARTIN-SBANDI-FLOR-GRUBE teach all limitations of claim 14. Furthermore, this claim recites features similar to those in claim 4. Therefore, claim 17 is rejected in a similar manner as in the rejection of claim 4.

Claim 18  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-GRUBE-BAKER, in view of MUDDU (US-20190109868-A1).
Regarding claim 18, MARTIN-SBANDI-FLOR-GRUBE-BAKER teach all limitations of claim 17. Furthermore, claim 18 recites features similar to those in claim 5. Therefore, claim 18 is rejected in a similar manner as in the rejection of claim 5. 

Claim 19  is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-GRUBE, in view of VASSEUR (US-10187413-B2), hereinafter MARTIN-SBANDI-FLOR- GRUBE-VASSEUR.
Regarding claim 19, MARTIN-SBANDI-FLOR-GRUBE teach all limitations of claim 14. Furthermore, claim 19 recites features similar to those in claim 6. Therefore, claim 19 is rejected in a similar manner as in the rejection of claim 6.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over MARTIN-SBANDI-FLOR-GRUBE-VASSEUR, in view of KIM (US-20200394526-A1).
Regarding claim 20, MARTIN-SBANDI-FLOR- GRUBE-VASSEUR teach all limitations of claim 19. Furthermore, claim 20 recites features similar to those in claim 7. Therefore, claim 20 is rejected in a similar manner as in the rejection of claim 7.


The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.

BARDENSTEIN (US-20190222601-A1) discloses a security system which detects and attributes anomalous activity in a network. Utilizing machine learning in order to better identify the anomalous activity.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to AFAQ ALI whose telephone number is (571)272-1571. The examiner can normally be reached Mon - Fri 7:30am - 5:30pm EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kambiz Zand can be reached on (571)272-3811. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/AFAQ ALI/Examiner, Art Unit 2434                 
                                                                                                                                                             /NOURA ZOUBAIR/Primary Examiner, Art Unit 2434