DETAILED ACTION
This action is made FINAL in response to the amendments filed on 1/26/2021.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.

Claim 1 – 8, 10 – 15, 17, 22, and 23 is/are rejected under 35 U.S.C. 103 as being unpatentable over McCusker et al (US 2012/0072983) in view of Redlich et al (US 2010/0250497).
As to claim 1, McCusker et al a system to protect an electric power grid control system (paragraph [0002]...there is a profound need for innovative technology and operations that can defend networks against the growing complexity of network threats and insider attacks. Networks have become an integral part of a wide range of activities including business processes, government operations, and the national power grid), comprising:
a plurality of heterogeneous data source nodes (paragraph [0073]... heterogeneous data sources) each generating a series of data source node values (paragraph [0008]...sensor data) over time (paragraph [0008]...time period ; paragraph [0101]...time series) associated with operation of the electric power grid control system; and
an offline abnormal state detection model creation computer (paragraph [0036]...threats and detected by performing Temporal Aggregated Behavioral Analysis (TABA) on network data fused from sensors that are used to distinguish normal and abnormal behavior), coupled to the heterogeneous data source nodes, including:
a computer processor (paragraph [0109]... a processor coupled with the bus for processing the information), and
a computer memory (paragraph [0109]...a main memory, such as a random access memory (RAM) or other dynamic storage device (e.g., dynamic RAM (DRAM), static RAM (SRAM), and synchronous DRAM (SDRAM)), coupled to the bus for storing information and instructions to be executed by processor), coupled to the computer processor, storing instructions that when executed by the computer processor cause the offline abnormal state detection model creation computer to:
(i)    receive the series of data source node values and perform a feature extraction (paragraph [0107]...extract behavioral features) process to generate an initial set of feature vectors (paragraph [0051]...A behavioral feature vector 810 is derived from raw sensor data),
(ii)    perform feature selection with a multi-model, multi-disciplinary framework to generate a selected feature vector subset (paragraph [0092]... Each CCE module has a specific view into the behavioral feature space module BFSM 403 that includes a vector of behavioral features, e.g., byte variance, byte per flow, and packet per flow, as illustrated in FIG. 9. Within a specific view of the feature space there are known regions having hot spot activity and normal activity. Essentially, the CCE modules measure the distance, using various types of algorithms, between a contact's current behaviors and these known regions),
paragraph [0008]...determining, for the specific contact from the contact behavior feature vector, scores associated with each of a plurality of classification modules to form a contact score vector, the contact score vector being independent of an identity of the specific contact; (5) identifying a type of the specific contact based on the contact score vector; and (6) determining a threat type, based on the contact behavioral feature vector and the contact score vector, when the type of the specific contact is determined to be a threat in the identifying step) for an abnormal state detection model based on the selected feature vector subset: and
a real-time (paragraph [0060]...real-time) threat detection computer, coupled to the plurality of heterogeneous data source nodes, to:
(i)    receive a series of current data source node values and generate a set of current feature vectors based on the offline feature creation process.
(ii)    access the abnormal state detection model having the at least one decision
boundary created offline (paragraph [0102]...post-analysis (offline)), and
(iii) execute the abnormal state detection model and transmit an abnormal state
alert signal (paragraph [0037]...alert-centric)  based on the set of current feature vectors and the at least one decision boundary (paragraph [0060]...is the configuration step, which includes setting behavioral trust weights according to the security policies deployed environment, configuring network assets, and configuring risk management by mapping business processes and missions to configured assets. When a network infrastructure asset, like an application server supporting critical financial web services, is threatened by one or more cyber threats, the business processes using that asset are now at risk. A near real-time value of risk is derived based on the security policies within an organization driving the configuration of trust weights associated with specific network behaviors and those behaviors detected by the system and found acting on those specific assets. A business process or mission that is supported by multiple assets that all are found to have threat behaviors with low trust scores will have a higher risk value than that of a different business process or mission that only has one asset, with a high trust score (i.e. the behaviors are trusted by the organization) derived from the contact behavioral feature vector for the asset. This step also includes setting the mode of operation to a pre-defined setting, including, but not limited to the modes of: generic, asset monitoring (excluding dark-space behaviors), data exfiltration, and botnet).
McCusker et al fails to explicitly show/teach wherein the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from electric power switches, data from critical measurement points of an electric bus, and circuit breaker data.
However, Redlich et al teaches the heterogeneous data source nodes (paragraph [0375]...multiple heterogeneous sources) are associated with image data (paragraph [0021]... select content is represented by one or more predetermined words, characters, images ; paragraph [0368]... [0368] Support for any interpretable data stream (ASCII, signal, image, etc), social media data (paragraph [1912]...the iterative results create an asymptotic adjacency list model with a social networking relatedness. Social networking relatedness is often viewed as flow charts showing betweenness, closeness, and connectedness ; paragraph [1951]... Divergence is all about aggregation, inference, and data-to-data interaction because it specifically searches for links, references, relationships, outliers, and social networking associations to the search terms), weather data (paragraph [0730]...weather predictions), actuator nodes of the electric power grid (paragraph [0142]...grid-based resources ; paragraph [0130]...electric power grid), data from electric power switches (paragraph [0022]...switches which electrically isolate), data from critical measurement points of an electric bus (paragraph [3041]...internal bus), and circuit breaker data (paragraph [0101]...circuit breaker).
Therefore , it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al’s heterogeneous data source nodes to be associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from electric power 

As to claim 2, McCusker et al shows the system, wherein the offline abnormal state detection model creation computer (paragraph [0036]...threats and detected by performing Temporal Aggregated Behavioral Analysis (TABA) on network data fused from sensors that are used to distinguish normal and abnormal behavior) is further to perform a feature dimensionality reduction process to generate the selected feature vector subset (paragraph [0041]...the aggregated behavioral feature space represents an N-dimensional structure capturing contact behaviors over a set of defined time periods: hour, day, week, month, year, cumulative, and/or a custom time period. This contact-centric feature space is scalable compared to conventional systems in that there is a known number of behaviors and time periods being collected per system).

As to claim 3, McCusker et al shows the system, wherein the feature dimensionality reduction process is associated with a feature selection technique (paragraph [0041]...the aggregated behavioral feature space represents an N-dimensional structure capturing contact behaviors over a set of defined time periods: hour, day, week, month, year, cumulative, and/or a custom time period. This contact-centric feature space is scalable compared to conventional systems in that there is a known number of behaviors and time periods being collected per system).

As to claim 4, McCusker et al shows the system, wherein the feature dimensionality reduction process is associated with a feature transformation technique paragraph [0047]...Conventional Network Behavioral Analyzers 105 are alert/data-centric and operate on raw packet and flow data (801) without transforming the data into a host-centric perspective. The classes of algorithms within the Network Behavioral Analyzers 105 are tuned to process raw data and events. These approaches are referred to as alert/data-centric technologies herein to distinguish from the present embodiments, which are threat/contact-centric, wherein the methodology processes network contacts such as, but not limited to, hosts and their respective aggregated behaviors).
As to claim 5, McCusker et al shows the system, wherein the received series of data source node values includes normal data source node values and abnormal data source node values (paragraph [0010]...the acquiring step includes acquiring the sensor data that identifies the contact, the contact being one of a host, a host group, a network, an autonomous system, and a country).

As to claim 6, McCusker et al shows teaches the system, wherein at least one of the heterogeneous data source nodes (paragraph [0073]... heterogeneous data sources) is associated with at least one of: (i) sensor data (paragraph [0014]...sensor data), (ii) text data, (iii) cellular telephone data (paragraph [0120]...cellular telephone), (iv) satellite data, (v) web data (paragraph [0060]... web services), (vi) wireless network data, (vii) information technology inputs, (viii) critical sensor nodes of the electric power grid, (ix) controller nodes of the electric power grid, (x) key software nodes of the electric power grid, (xi) wi-fi activity data (paragraph [0046]...system 100 could include the external Internet 101), and (xii) cyber infrastructure status data (paragraph [0046]...cyber defense technology 104).

As to claim 7, McCusker et al shows the system, wherein the feature selection is further associated with a shallow feature learning technique (paragraph [0007]...There has been a wide range of anomaly detection behavioral models created in the past as, illustrated in the taxonomy shown in FIG. 2. These behavioral models are broken down into two broad types: a learnt model and a specification model. The learnt model employs unsupervised learning methods to discover anomalies without prior knowledge, while the specification model requires a description of the anomaly to detect known threats).

As to claim 8, McCusker et al shows the system, wherein the shallow feature learning technique utilizes at least one of: (i) unsupervised learning (paragraph [0007]... unsupervised learning methods), (ii) k-means clustering (paragraph [0053]...Learnt Model, and are statistically based. For example, we are employing the use of clustering algorithms (birch, k-means) to identify high-density regions within the feature space to derive the normative specification of a given network), (iii) manifold learning, (iv) non-linear embedding, (v) an isomap method, (vi) Locally-Linear Embedding (“LLE”), (vii) low-dimension projection, (viii) Principal Component Analysis (“PCA”), (ix) Independent Component Analysis (“ICA”), (x) neural networks, (xi) a Self-Organizing Map (“SOM”) method, (xii) genetic programming, and (xiii) sparse coding.

As to claim 10, McCusker et al shows the system, wherein the feature selection is further associated with a knowledge-based features technique (paragraph [0054]...the Knowledge Discovery Module (KDM) 405 manages the set of pre-defined and discovered behavioral primitives that are used to describe and classify a threat).

As to claim 11, McCusker et al shows the system, wherein the knowledge-based features technique (paragraph [0054]...the Knowledge Discovery Module (KDM) 405 manages the set of pre-defined and discovered behavioral primitives that are used to describe and classify a threat) paragraph [0053]...Learnt Model, and are statistically based. For example, we are employing the use of clustering algorithms (birch, k-means) to identify high-density regions within the feature space to derive the normative specification of a given network), (iv) variance data (paragraph [0051]...variance), (v) different orders of moments, and (vi) fast Fourier transformation spectrum information.

As to claim 12, McCusker et al shows the system, wherein the knowledge-based features technique utilizes a power system analysis including at least one of: (i) basis vector decomposition (paragraph [0106]... The security threats are decomposed into quantifiable behavioral primitives), (ii) state estimation, (iii) network observability matrices, (iv) topology matrices, (v) system plant matrices, (vi) frequency domain features, (vii) system poles, and (viii) system zeros.

As to claim 13, McCusker et al shows the system, wherein the selected feature vector subset is further used in connection with at least one of: (i) anomaly detection (paragraph [0006]... anomaly detection systems), (ii) anomaly accommodation, (iii) anomaly forecasting, and (iv) system diagnosis.

As to claim 14, McCusker et al shows the system, wherein a dynamic model is identified for an optimal subset of the initial set of feature vectors to capture an evolution of features over time (paragraph [0090]... The CCE using a unique dynamic cascading processing mechanism that chooses a set of CCE modules based on the initial context derived by contact behaviors).

As to claim 15, McCusker et al shows the system, wherein features are associated with a dynamic model comprising of at least one of: (i) stability margins, (ii) controllability indices (paragraph [0116]... the computer code devices of the present invention may be any interpretable or executable code mechanism, including but not limited to scripts, interpretable programs, dynamic link libraries (DLLs), Java classes, and complete executable programs. Moreover, parts of the processing of the present invention may be distributed for better performance, reliability, and/or cost), (iii) observability indices, (iv) elements of an observability matrix, (v) elements of a controllability matrix, (vi) poles, and (vii) zeros of the dynamic model of the evolution of features over time.

As to claim 17, McCusker et al shows the system, wherein the abnormal state detection model is associated with at least one of: (i) an actuator attack, (ii) a controller attack, (iii) a data source node attack (paragraph [0079]... types of malware delivered by threat), (iv) a plant state attack, (v) spoofing, (vi) physical damage (paragraph [0085]... identifying threats and threat behaviors to assets), (vii) unit availability, (viii) a unit trip, (ix) a loss of unit life, and (x) asset damage requiring at least one new part.

As to claim 22, McCusker et al teaches anon-transitory, computer-readable medium (paragraph [0113]...computer readable medium) storing instructions that, when executed by a computer processor, cause the computer processor to perform a method to protect an electric power grid control system (paragraph [0002]...there is a profound need for innovative technology and operations that can defend networks against the growing complexity of network threats and insider attacks. Networks have become an integral part of a wide range of activities including business processes, government operations, and the national power grid), the method comprising: 
paragraph [0109]...a processor coupled with the bus for processing the information), from a plurality of heterogeneous data source nodes (paragraph [0073]... heterogeneous data sources) each generating a series of data source node values (paragraph [0008]...sensor data) over time (paragraph [0008]...time period ; paragraph [0101]...time series) associated with operation of the electric power grid control system; and
performing (paragraph [0109]...a processor coupled with the bus for processing the information), by an offline abnormal state detection model creation computer processor (paragraph [0036]...threats and detected by performing Temporal Aggregated Behavioral Analysis (TABA) on network data fused from sensors that are used to distinguish normal and abnormal behavior), a feature extraction process (paragraph [0107]...extract behavioral features)  to generate an initial set of feature vectors (paragraph [0051]...A behavioral feature vector 810 is derived from raw sensor data);  including:
performing feature selection with a multi-model, multi-disciplinary framework to generate a selected feature vector subset (paragraph [0092]... Each CCE module has a specific view into the behavioral feature space module BFSM 403 that includes a vector of behavioral features, e.g., byte variance, byte per flow, and packet per flow, as illustrated in FIG. 9. Within a specific view of the feature space there are known regions having hot spot activity and normal activity. Essentially, the CCE modules measure the distance, using various types of algorithms, between a contact's current behaviors and these known regions);
 automatically calculating and outputting at least one decision boundary (paragraph [0008]...determining, for the specific contact from the contact behavior feature vector, scores associated with each of a plurality of classification modules to form a contact score vector, the contact score vector being independent of an identity of the specific contact; (5) identifying a type of the specific contact based on the contact score vector; and (6) determining a threat type, based on the contact behavioral feature vector and the contact score vector, when the type of the specific contact is determined to be a threat in the identifying step) for an abnormal state detection model based on the selected feature vector subset; 
receiving, at a real-time (paragraph [0060]...real-time) threat detection computer processor, a series of current data source node values (paragraph [0074]...raw data is ingested from a series of one or more sensors); 
generating a set of current feature vectors (paragraph [0086]...this creates an n-dimensional feature vector that defines the various aggregated behaviors of the contact) based on the offline feature creation process;
accessing the abnormal state detection model having the at least one decision boundary created offline (paragraph [0102]...post-analysis (offline));
executing (paragraph [0109]... executed by processor) the abnormal state detection model;
transmitting an abnormal state alert signal (paragraph [0037]...alert-centric) based on the set of current feature vectors and the at least one decision boundary (paragraph [0060]...is the configuration step, which includes setting behavioral trust weights according to the security policies deployed environment, configuring network assets, and configuring risk management by mapping business processes and missions to configured assets. When a network infrastructure asset, like an application server supporting critical financial web services, is threatened by one or more cyber threats, the business processes using that asset are now at risk. A near real-time value of risk is derived based on the security policies within an organization driving the configuration of trust weights associated with specific network behaviors and those behaviors detected by the system and found acting on those specific assets. A business process or mission that is supported by multiple assets that all are found to have threat behaviors with low trust scores will have a higher risk value than that of a different business process or mission that only has one asset, with a high trust score (i.e. the behaviors are trusted by the organization) derived from the contact behavioral feature vector for the asset. This step also includes setting the mode of operation to a predefined setting, including, but not limited to the modes of: generic, asset monitoring (excluding dark-space behaviors), data exfiltration, and botnet). 
McCusker et al fails to explicitly show/teach wherein the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, circuit breaker data, wi-fi activity data, and cyber infrastructure. 
However, Redlich et al teaches the heterogeneous data source nodes (paragraph [0375]...multiple heterogeneous sources) are associated with image data (paragraph [0021]... select content is represented by one or more predetermined words, characters, images ; paragraph [0368]... [0368] Support for any interpretable data stream (ASCII, signal, image, etc), social media data (paragraph [1912]...the iterative results create an asymptotic adjacency list model with a social networking relatedness. Social networking relatedness is often viewed as flow charts showing betweenness, closeness, and connectedness ; paragraph [1951]... Divergence is all about aggregation, inference, and data-to-data interaction because it specifically searches for links, references, relationships, outliers, and social networking associations to the search terms), weather data (paragraph [0730]...weather predictions), actuator nodes of the electric power grid (paragraph [0142]...grid-based resources ; paragraph [0130]...electric power grid), circuit breaker data (paragraph [0101]...circuit breaker), wi-fi activity data (paragraph [0046]...system 100 could include the external Internet 101),, and cyber infrastructure (paragraph [0046]...cyber defense technology 104).
Therefore , it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al’s the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, circuit breaker data, wi-fi activity data, and cyber infrastructure, as in Redlich et al, for the purpose of having a plurality of 

Claim 23 has similar limitations as claim 21. Therefore, the claim is rejected for the same reasons as above. 


Claims 9 and 19 - 21 is/are rejected under 35 U.S.C. 103 as being unpatentable over McCusker et al (US 2012/0072983) in view of Redlich et al (US 2010/0250497) and in further view of Ghadar et al (US 2017/0192411)
As to claim 9, McCusker et al perform feature selection with a multi-model, multi-disciplinary framework to generate a selected feature vector subset (paragraph [0092]... Each CCE module has a specific view into the behavioral feature space module BFSM 403 that includes a vector of behavioral features, e.g., byte variance, byte per flow, and packet per flow, as illustrated in FIG. 9. Within a specific view of the feature space there are known regions having hot spot activity and normal activity. Essentially, the CCE modules measure the distance, using various types of algorithms, between a contact's current behaviors and these known regions).
McCusker et al and Redlich et al both fail to explicitly show/teach wherein the feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine.
However, Ghadar et al teaches feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine (paragraph [0044]... "Deep Learning" techniques can be used to further define and extract new descriptive features. Instead of using heuristic features, "Deep Learning" algorithms extract features from raw nominal images. For feature selection, stack Restricted Boltzmann Machines (RBMs) or auto-encoders (e.g., Deep Learning auto-encoders) can be used. Traditional classifiers use heuristic features designed based on prior knowledge. In contrast, Deep Learning can automatically learn relevant features to minimize the difference between desired and actual outputs, thus solving an optimization problem. Deep Learning can eliminate the challenges associated finding the right heuristic. For instance, assume there are examples from multiple dies/fields with varying (unknown) doses and exposures. An auto-encoder can involve Deep Learning with a fully connected initial layer followed by a hidden layer with smaller weight(s). The final layer of auto encoder can have the same number of nodes as the input layer. The auto-encoder can be trained so that the output value Y is as close as possible as input X. In other words, auto-encoders are trained to reconstruct their own inputs with minimum loss. Once trained, the middle layer output can be viewed as lower dimensional representation of the input. An auto-encoder can be used to lower feature dimensionality similar to Principal Component Analysis (PCA) or ICA).
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al’s feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine, as in Ghadar et al, for the purpose of further defining and extracting new descriptive features.

As to claim 19, McCusker et al teaches a computerized method to protect an electric power grid control system (paragraph [0002]...there is a profound need for innovative technology and operations that can defend networks against the growing complexity of network threats and insider attacks. Networks have become an integral part of a wide range of activities including business processes, government operations, and the national power grid), comprising: 
paragraph [0109]...a processor coupled with the bus for processing the information), from a plurality of heterogeneous data source nodes (paragraph [0073]... heterogeneous data sources), a series of data source node values (paragraph [0008]...sensor data) over time (paragraph [0008]...time period ; paragraph [0101]...time series) associated with operation of the electric power grid control system; and
performing (paragraph [0109]...a processor coupled with the bus for processing the information), by an offline abnormal state detection model creation computer processor  (paragraph [0036]...threats and detected by performing Temporal Aggregated Behavioral Analysis (TABA) on network data fused from sensors that are used to distinguish normal and abnormal behavior), a feature extraction process (paragraph [0107]...extract behavioral features)  to generate an initial set of feature vectors (paragraph [0051]...A behavioral feature vector 810 is derived from raw sensor data); 
performing feature selection with a multi-model, multi-disciplinary framework to generate a selected feature vector subset (paragraph [0092]... Each CCE module has a specific view into the behavioral feature space module BFSM 403 that includes a vector of behavioral features, e.g., byte variance, byte per flow, and packet per flow, as illustrated in FIG. 9. Within a specific view of the feature space there are known regions having hot spot activity and normal activity. Essentially, the CCE modules measure the distance, using various types of algorithms, between a contact's current behaviors and these known regions);
 automatically calculating and outputting at least one decision boundary (paragraph [0008]...determining, for the specific contact from the contact behavior feature vector, scores associated with each of a plurality of classification modules to form a contact score vector, the contact score vector being independent of an identity of the specific contact; (5) identifying a type of the specific contact based on the contact score vector; and (6) determining a threat type, based on the contact behavioral feature vector and the contact score vector, when the type of the specific contact is determined to be a threat in the identifying step) for an abnormal state detection model based on the selected feature vector subset; 
receiving, at a real-time (paragraph [0060]...real-time) threat detection computer processor, a series of current data source node values (paragraph [0074]...raw data is ingested from a series of one or more sensors); 
generating a set of current feature vectors (paragraph [0086]...this creates an n-dimensional feature vector that defines the various aggregated behaviors of the contact) based on the offline feature creation process;
accessing the abnormal state detection model having the at least one decision boundary created offline (paragraph [0102]...post-analysis (offline)) associated with at least one of  (i) an actuator attack, (ii) a controller attack (paragraph [0046]...detect threats, e.g., botnets, viruses, and hackers), (hi) a data source node attack (paragraph [0046]...detect threats, e.g., botnets, viruses, and hackers ; paragraph [0079]... types of malware delivered by threat), (iv) a plant state attack. (V) spoofing. (Vi) physical damage (paragraph [0085]... identifying threats and threat behaviors to assets) (vii) unit availability. (viii) a unit trip, (ix) a loss of unit life, and (x) asset damage requiring at least one new part;
executing (paragraph [0109]... executed by processor) the abnormal state detection model;
transmitting an abnormal state alert signal (paragraph [0037]...alert-centric) based on the set of current feature vectors and the at least one decision boundary (paragraph [0060]...is the configuration step, which includes setting behavioral trust weights according to the security policies deployed environment, configuring network assets, and configuring risk management by mapping business processes and missions to configured assets. When a network infrastructure asset, like an application server supporting critical financial web services, is threatened by one or more cyber threats, the business processes using that asset are now at risk. A near real-time value of risk is derived based on the security policies within an organization driving the configuration of trust weights associated with specific network behaviors and those behaviors detected by the system and found acting on those specific assets. A business process or mission that is supported by multiple assets that all are found to have threat behaviors with low trust scores will have a higher risk value than that of a different business process or mission that only has one asset, with a high trust score (i.e. the behaviors are trusted by the organization) derived from the contact behavioral feature vector for the asset. This step also includes setting the mode of operation to a predefined setting, including, but not limited to the modes of: generic, asset monitoring (excluding dark-space behaviors), data exfiltration, and botnet). 
McCusker et al fails to explicitly show/teach wherein the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from electric power switches, data from critical measurement points of an electric bus, and circuit breaker data and that the feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine.
However, Redlich et al teaches the heterogeneous data source nodes (paragraph [0375]...multiple heterogeneous sources) are associated with image data (paragraph [0021]... select content is represented by one or more predetermined words, characters, images ; paragraph [0368]... [0368] Support for any interpretable data stream (ASCII, signal, image, etc), social media data (paragraph [1912]...the iterative results create an asymptotic adjacency list model with a social networking relatedness. Social networking relatedness is often viewed as flow charts showing betweenness, closeness, and connectedness ; paragraph [1951]... Divergence is all about aggregation, inference, and data-to-data interaction because it specifically searches for links, references, relationships, outliers, and social networking associations to the search terms), weather data (paragraph [0730]...weather predictions), actuator nodes of the electric power grid (paragraph [0142]...grid-based resources ; paragraph [0130]...electric power grid), data from electric power switches (paragraph [0022]...switches which electrically isolate), data from critical measurement points of an electric bus (paragraph [3041]...internal bus), and circuit breaker data (paragraph [0101]...circuit breaker).
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al’s heterogeneous data source nodes to be associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from electric power switches, data from critical measurement points of an electric bus, and circuit breaker data, as in Redlich et al, for the purpose of having a plurality of information infrastructure tools to copy, extract, archive, distribute, and a copy-extract-archive and distribute process data.
McCusker et al and Redlich et al both fail to explicitly show/teach wherein the feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine.
However, Ghadar et al teaches feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine (paragraph [0044]... "Deep Learning" techniques can be used to further define and extract new descriptive features. Instead of using heuristic features, "Deep Learning" algorithms extract features from raw nominal images. For feature selection, stack Restricted Boltzmann Machines (RBMs) or auto-encoders (e.g., Deep Learning auto-encoders) can be used. Traditional classifiers use heuristic features designed based on prior knowledge. In contrast, Deep Learning can automatically learn relevant features to minimize the difference between desired and actual outputs, thus solving an optimization problem. Deep Learning can eliminate the challenges associated finding the right heuristic. For instance, assume there are examples from multiple dies/fields with varying (unknown) doses and exposures. An auto-encoder can involve Deep Learning with a fully connected initial layer followed by a hidden layer with smaller weight(s). The final layer of auto encoder can have the same number of nodes as the input layer. The auto-encoder can be trained so that the output value Y is as close as possible as input X. In other words, auto-encoders are trained to reconstruct their own inputs with minimum loss. Once trained, the middle layer output can be viewed as lower dimensional representation of the input. An auto-encoder can be used to lower feature dimensionality similar to Principal Component Analysis (PCA) or ICA).
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al’s feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine, as in Ghadar et al, for the purpose of further defining and extracting new descriptive features.

Claim 20 has similar limitations as claim 6. Therefore, the claim is rejected for the same reasons as above. 

Claim 21 has similar limitations as claim 7. Therefore, the claim is rejected for the same reasons as above. 

Claim 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over McCusker et al (US 2012/0072983) in view of Redlich et al (US 2010/0250497) and in further view of CHARI et al (US 2017/0061322).
paragraph [0036]...threats and detected by performing Temporal Aggregated Behavioral Analysis (TABA) on network data fused from sensors that are used to distinguish normal and abnormal behavior).
McCusker et al and Redlich et al both fail to explicitly show/teach the abnormal state detection model including the at least one decision boundary is associated with at least one of: (i) a line, (ii) a hyperplane, and (iii) a non-linear boundary separating normal space and abnormal space.
However, CHARI et al teaches an abnormal state detection model including the at least one decision boundary is associated with at least one of: (i) a line, (ii) a hyperplane, and (iii) a non-linear boundary separating (figure 3) normal space and abnormal space (paragraph [0039]...figure 3shows exemplarily the second of these two exemplary alternatives, wherein, for an arbitrary target user 302, the dotted lines 304 demarcate the samples from other users' U1, U2, U3 data samples that are chosen to be anomalous data for target user 302. Trapezoid 306 encircles the target user's sample data. It should be noted how the dotted lines 304, to become abnormal samples for target user 302, encircle the other users' data points that are closest to the cluster of target user's sample points ; paragraph [0053]...having both normal and anomalous samples in the training data allows the anomaly detection task to be cast as a two-class classification problem, so that a classifier can be learned that can discriminate the abnormal samples from the normal samples).
Therefore, it would have been obvious for one having ordinary skill in the art, before the effective filing date of the claimed invention, for McCusker et al abnormal state detection model including the at least one decision boundary is associated with at least one of: (i) a line, (ii) a hyperplane, and (iii) a non-linear boundary separating (figure 3) normal space and abnormal space, as in Ghadar et al, for the purpose of clearly defining the decision boundaries. 
Response to Arguments
Applicant's arguments filed 1/26/2021have been fully considered but they are not persuasive.
As to claims 1 and 19, McCusker et al fails to explicitly show/teach wherein the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from electric power switches, data from critical measurement points of an electric bus, and circuit breaker data.
However, Redlich et al teaches the heterogeneous data source nodes (paragraph [0375]...multiple heterogeneous sources) are associated with image data (paragraph [0021]... select content is represented by one or more predetermined words, characters, images ; paragraph [0368]... [0368] Support for any interpretable data stream (ASCII, signal, image, etc), social media data (paragraph [1912]...the iterative results create an asymptotic adjacency list model with a social networking relatedness. Social networking relatedness is often viewed as flow charts showing betweenness, closeness, and connectedness ; paragraph [1951]... Divergence is all about aggregation, inference, and data-to-data interaction because it specifically searches for links, references, relationships, outliers, and social networking associations to the search terms), weather data (paragraph [0730]...weather predictions), actuator nodes of the electric power grid (paragraph [0142]...grid-based resources ; paragraph [0130]...electric power grid), data from electric power switches (paragraph [0022]...switches which electrically isolate), data from critical measurement points of an electric bus (paragraph [3041]...internal bus), and circuit breaker data (paragraph [0101]...circuit breaker).
It would have been an obvious matter of design choice for, since applicant has not disclosed that heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, data from 
Therefore, McCusker in view of Redlich et al clearly shows all the limitations as claimed. 

As to claim 19, McCusker et al and Redlich et al both fail to explicitly show/teach wherein the feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine.
However, Ghadar et al teaches feature selection is further associated with a deep feature learning technique associated with at least one of: (i) an auto-encoder, (ii) a de-noising auto-encoder, and (iii) a restricted Boltzmann machine (paragraph [0044]... "Deep Learning" techniques can be used to further define and extract new descriptive features. Instead of using heuristic features, "Deep Learning" algorithms extract features from raw nominal images. For feature selection, stack Restricted Boltzmann Machines (RBMs) or auto-encoders (e.g., Deep Learning auto-encoders) can be used. Traditional classifiers use heuristic features designed based on prior knowledge. In contrast, Deep Learning can automatically learn relevant features to minimize the difference between desired and actual outputs, thus solving an optimization problem. Deep Learning can eliminate the challenges associated finding the right heuristic. For instance, assume there are examples from multiple dies/fields with varying (unknown) doses and exposures. An auto-encoder can involve Deep Learning with a fully connected initial layer followed by a hidden layer with smaller weight(s). The final layer of auto encoder can have the same number of nodes as the input layer. The auto-encoder can be trained so that the output value Y is as close as possible as input X. In other words, auto-encoders are trained to reconstruct their own inputs with minimum loss. Once trained, the middle layer output can be viewed as lower dimensional representation of the input. An auto-encoder can be used to lower feature dimensionality similar to Principal Component Analysis (PCA) or ICA).
Therefore, McCusker et al in view of Redlich et al and in further view of Ghadar et a teaches all the limitations as claimed.  

As to claim 22, McCusker et al fails to explicitly show/teach wherein the heterogeneous data source nodes are associated with image data, social media data, weather data, actuator nodes of the electric power grid, circuit breaker data, wi-fi activity data, and cyber infrastructure. 
However, Redlich et al teaches the heterogeneous data source nodes (paragraph [0375]...multiple heterogeneous sources) are associated with image data (paragraph [0021]... select content is represented by one or more predetermined words, characters, images ; paragraph [0368]... [0368] Support for any interpretable data stream (ASCII, signal, image, etc), social media data (paragraph [1912]...the iterative results create an asymptotic adjacency list model with a social networking relatedness. Social networking relatedness is often viewed as flow charts showing betweenness, closeness, and connectedness ; paragraph [1951]... Divergence is all about aggregation, inference, and data-to-data interaction because it specifically searches for links, references, relationships, outliers, and social networking associations to the search terms), weather data (paragraph [0730]...weather predictions), actuator nodes of the electric power grid (paragraph [0142]...grid-based resources ; paragraph [0130]...electric power grid), circuit breaker data (paragraph [0101]...circuit breaker), wi-fi activity data (paragraph [0046]...system 100 could include the external Internet 101),, and cyber infrastructure (paragraph [0046]...cyber defense technology 104).
Therefore, McCusker et al’s in view of Redlich et al shows all the limitations as claimed. 

Conclusion
THIS ACTION IS MADE FINAL.  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the mailing date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to BRANDON S COLE whose telephone number is (571)270-5075.  The examiner can normally be reached on Mon - Fri 7:30pm - 5pm EST (Alternate Friday's Off).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an 
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on 571-272-3179.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/BRANDON S COLE/           Primary Examiner, Art Unit 2122