DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Status of Claims
Claims 1-19 are pending in this Office Action.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 01/28/2021 and 05/24/2022 filed is/are in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Drawings
The formal drawings received on 01/28/2021 have been entered.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Independent Claim(s):
Step 1: Statutory Category. Claim(s) 1-19 is/are directed to statutory category of subject matter. The claim(s) does/do fall within at least one of the four categories of patent eligible subject matter because the claim(s) is/are directed to either a process, machine, manufacture, or composition of matter.
Step 2A: Prong One. Judicial Exception. Claim(s) 1-19 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) are directed to abstract idea of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, as explained in detail below. The claim(s) do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional computer elements, which are recited at a high level of generality, provide conventional computer functions that do not add meaningful limits to practicing the abstract idea. 
The independent claim(s) recites, in part, Claims 1, 10, 11. A method for detecting deviations from baseline behavior patterns for categorical features, comprising: determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold. These steps describe the concept of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, which corresponds to concepts identified as abstract ideas by the courts, such as Organizing and manipulating information through mathematical correlations (Digitech); Collecting information, analyzing it, and displaying certain results of the collection and analysis (Electric Power Group; West View; SAP America); A formula for computing an alarm limit (Flook); An algorithm for calculating parameters indicating an abnormal condition (Grams); Calculating the difference between local and average data values (Abele); Performing statistical analysis (SAP America). All of these concepts relate to “An Idea ‘Of Itself’” in which “An idea standing alone such as an uninstantiated concept, plan or scheme, as well as a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper;” “Certain Methods of Organizing Human Activity” in which “Concepts relating to interpersonal and intrapersonal activities, such as managing relationships or transactions between people, social activities, and human behavior; satisfying or avoiding a legal obligation; advertising, marketing, and sales activities or behaviors; and managing human mental activity;” “Mathematical Relationships/Formulas” in which “Mathematical concepts such as mathematical algorithms, mathematical relationships, mathematical formulas, and calculations.” The concept described in the claim(s) is/are not meaningfully different than “An Idea ‘Of Itself’”, “Certain Methods of Organizing Human Activity”, “Mathematical Relationships/Formulas” found by the courts to be abstract ideas. As such, the description in the claim(s) of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold is an abstract idea. Enfish, LLC v. Microsoft Corp. 822 F.3d 1327, 1335-36 (Fed. Cir. 2016) (“[T]he first step in the Alice inquiry in this case asks whether the focus of the claims [was] on the specific asserted improvement in computer capabilities … or, instead, on a process that qualifies as an ‘abstract idea’ for which computers are invoked merely as a tool.”) No such evidence exists on this record. Unlike Enfish, where the claims were focused on a specific improvement in how the computer functioned, the claim here merely uses the computer as a tool to perform the abstract concepts, and the claims are not rooted in technology and simply employs conventional techniques used by humans for detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold. The claim here is not similar to claimed patent’s innovative logical model for a computer database (p. 2-3), nor does the claim here have similar specific asserted improvement in computer capabilities (p. 7) as in the Enfish patent. Rather here, the claim is directed to automating the human behavior or task. (See Enfish Memo and Enfish v. Microsoft, May 2016).  In addition, simply limiting the invention to a technological environment does “not make an abstract concept any less abstract under step one.” Intellectual Ventures I, 850 F.3d at 1340. Therefore, based on the similarity of the concept described in this claim to abstract ideas identified by the courts in the claim is directed to an abstract idea. For these reasons, afford are ineligible.
Step 2A: Prong Two. Practical Application. Adding the words “apply it” (or an equivalent) with the judicial exception, or mere instructions to implement an abstract idea on a computer, or merely uses a computer as a tool to perform an abstract idea - see MPEP 2106.05(f). Adding insignificant extra-solution activity to the judicial exception - see MPEP 2106.05(g). Generally linking the use of the judicial exception to a particular technological environment or field of use – see MPEP 2106.05(h).
Step 2B: Additional Elements Significantly More Then the Judicial Exception. The independent claim(s) do/does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. The claim recites the additional limitations of a “processing circuitry” and a “memory,” the memory containing instructions that, when executed by the processing circuitry, configure the system to: determine a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determine a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; compare the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determine whether the scalar value is above a threshold; detect an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determine that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold. The “processing circuitry” and “memory,” are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. Generic computer components recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. Next, “detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold” is stated at a high level of generality without tying it to an algorithm that would improve the functionality of the technology and its broadest reasonable interpretation comprises only detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold through the use of some unspecified generic computers and interface. The use of generic computer components for detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold through an unspecified interface does not impose any meaningful limit on the computer implementation of the abstract idea. These independent claims include insignificant pre-solution limitation(s) and post-solution limitation(s) [network activity] that do not transform the patent-ineligible concept of an abstract idea to a patent-eligible concept even if they are performed using general purpose computer, as these pre-solution limitation(s) and post-solution limitation(s) add insignificant extrasolution activity to the judicial exception. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Additionally, adding the words ‘‘apply it’’ (or an equivalent) with the judicial exception (i.e., applying the judicial exception to the network security), or mere instructions to implement an abstract idea on a computer or generally linking the use of the judicial exception to a particular technological environment or field of use (i.e., the network security) is also found to not be enough to qualify as significantly more.

Dependent Claim(s):
Step 1: Statutory Category. Claim(s) 2-9 and 12-19 is/are directed to statutory category of subject matter. The claim(s) does/do fall within at least one of the four categories of patent eligible subject matter because the claim(s) is/are directed to either a process, machine, manufacture, or composition of matter.
Step 2A: Judicial Exception. Claim(s) 2-9 and 12-19 is/are rejected under 35 U.S.C. 101 because the claimed invention is directed to a judicial exception (i.e., a law of nature, a natural phenomenon, or an abstract idea) without significantly more. The claim(s) are directed to abstract idea of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities, as explained in detail below. The claim(s) do not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional computer elements, which are recited at a high level of generality, provide conventional computer functions that do not add meaningful limits to practicing the abstract idea. 
The dependent claim(s) recites, in part, Claims 2, 12. The method of claim 1, further comprising: performing at least one mitigation when an anomaly is detected. Claims 3, 13. The method of claim 1, wherein determining the first discrete probability distribution further comprises: determining a time window such that activity with respect to the categorical variable is assumed to be fully observed during the determined time window, wherein a duration of the time window Is based on a type of the categorical variable, wherein the first discrete probability distribution function is determined based on a portion of the first set of network activity data corresponding to the determined time window. Claims 4, 14. The method of claim 1, wherein determining the first discrete probability distribution further comprises: determining a sub-population of devices and systems indicated in the first network activity data, wherein the sub-population of devices and systems has a common attribute, wherein the portion of the first set of network activity data corresponding to the determined time window is related to the sub-population of devices. Claims 5, 15. The method of claim 1, wherein the scalar value increases as the difference between the first and second discrete probability distributions increases. Claims 6, 16. The method of claim 1, wherein the threshold is associated with the categorical variable. Claims 7, 17. The method of claim 1, wherein each discrete probability distribution indicates a probability of each of a plurality of potential categories for the categorical variable. Claims 8, 18. The method of claim 1, wherein the distance function is any of: a cross-entropy distance function, and a chi-squared statistic function. Claims 9, 19. The method of claim 1, wherein the categorical variable is any of: a host, a communication channel, and a port. These steps describe the concept of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities, which corresponds to concepts identified as abstract ideas by the courts, such as Organizing and manipulating information through mathematical correlations (Digitech); Collecting information, analyzing it, and displaying certain results of the collection and analysis (Electric Power Group; West View; SAP America); A formula for computing an alarm limit (Flook); An algorithm for calculating parameters indicating an abnormal condition (Grams); Calculating the difference between local and average data values (Abele); Performing statistical analysis (SAP America). All of these concepts relate to “An Idea ‘Of Itself’” in which “An idea standing alone such as an uninstantiated concept, plan or scheme, as well as a mental process (thinking) that “can be performed in the human mind, or by a human using a pen and paper;” “Certain Methods of Organizing Human Activity” in which “Concepts relating to interpersonal and intrapersonal activities, such as managing relationships or transactions between people, social activities, and human behavior; satisfying or avoiding a legal obligation; advertising, marketing, and sales activities or behaviors; and managing human mental activity;” “Mathematical Relationships/Formulas” in which “Mathematical concepts such as mathematical algorithms, mathematical relationships, mathematical formulas, and calculations.” The concept described in the claim(s) is/are not meaningfully different than “An Idea ‘Of Itself’”, “Certain Methods of Organizing Human Activity”, “Mathematical Relationships/Formulas” found by the courts to be abstract ideas. As such, the description in the claim(s) of detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities is an abstract idea. Enfish, LLC v. Microsoft Corp. 822 F.3d 1327, 1335-36 (Fed. Cir. 2016) (“[T]he first step in the Alice inquiry in this case asks whether the focus of the claims [was] on the specific asserted improvement in computer capabilities … or, instead, on a process that qualifies as an ‘abstract idea’ for which computers are invoked merely as a tool.”) No such evidence exists on this record. Unlike Enfish, where the claims were focused on a specific improvement in how the computer functioned, the claim here merely uses the computer as a tool to perform the abstract concepts, and the claims are not rooted in technology and simply employs conventional techniques used by humans for detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities. The claim here is not similar to claimed patent’s innovative logical model for a computer database (p. 2-3), nor does the claim here have similar specific asserted improvement in computer capabilities (p. 7) as in the Enfish patent. Rather here, the claim is directed to automating the human behavior or task. (See Enfish Memo and Enfish v. Microsoft, May 2016).  In addition, simply limiting the invention to a technological environment does “not make an abstract concept any less abstract under step one.” Intellectual Ventures I, 850 F.3d at 1340. Therefore, based on the similarity of the concept described in this claim to abstract ideas identified by the courts in the claim is directed to an abstract idea. For these reasons, afford are ineligible.
Step 2B: Additional Elements Significantly More Then the Judicial Exception. The dependent claim(s) do/does not include additional elements that are sufficient to amount to significantly more than the judicial exception because the additional elements when considered both individually and as an ordered combination do not amount to significantly more than the abstract idea. The claim recites the additional limitations of a ““processing circuitry” and a “memory,” the memory containing instructions that, when executed by the processing circuitry, configure the system to: Claims 2, 12. The method of claim 1, further comprising: performing at least one mitigation when an anomaly is detected. Claims 3, 13. The method of claim 1, wherein determining the first discrete probability distribution further comprises: determining a time window such that activity with respect to the categorical variable is assumed to be fully observed during the determined time window, wherein a duration of the time window Is based on a type of the categorical variable, wherein the first discrete probability distribution function is determined based on a portion of the first set of network activity data corresponding to the determined time window. Claims 4, 14. The method of claim 1, wherein determining the first discrete probability distribution further comprises: determining a sub-population of devices and systems indicated in the first network activity data, wherein the sub-population of devices and systems has a common attribute, wherein the portion of the first set of network activity data corresponding to the determined time window is related to the sub-population of devices. Claims 5, 15. The method of claim 1, wherein the scalar value increases as the difference between the first and second discrete probability distributions increases. Claims 6, 16. The method of claim 1, wherein the threshold is associated with the categorical variable. Claims 7, 17. The method of claim 1, wherein each discrete probability distribution indicates a probability of each of a plurality of potential categories for the categorical variable. Claims 8, 18. The method of claim 1, wherein the distance function is any of: a cross-entropy distance function, and a chi-squared statistic function. Claims 9, 19. The method of claim 1, wherein the categorical variable is any of: a host, a communication channel, and a port. The “processing circuitry” and “memory” are recited at a high level of generality and are recited as performing generic computer functions routinely used in computer applications. Generic computer components recited as performing generic computer functions that are well-understood, routine and conventional activities amount to no more than implementing the abstract idea with a computerized system. Next, “detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities,” is stated at a high level of generality without tying it to an algorithm that would improve the functionality of the technology and its broadest reasonable interpretation comprises only detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities, through the use of some unspecified generic computers and interface. The use of generic computer components for detecting deviations from baseline behavior patterns for categorical features by determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; determining whether the scalar value is above a threshold; detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold, without significant extrasolution activities, through an unspecified interface does not impose any meaningful limit on the computer implementation of the abstract idea. These dependent claims include insignificant pre-solution limitation(s) and post-solution limitation(s) that do not transform the patent-ineligible concept of an abstract idea to a patent-eligible concept even if they are performed using general purpose computer, as these pre-solution limitation(s) and post-solution limitation(s) add insignificant extrasolution activity to the judicial exception. Thus, taken alone, the additional elements do not amount to significantly more than the above-identified judicial exception (the abstract idea). Looking at the limitations as an ordered combination adds nothing that is not already present when looking at the elements taken individually. There is no indication that the combination of elements improves the functioning of a computer or improves any other technology. Their collective functions merely provide conventional computer implementation. Additionally, adding the words ‘‘apply it’’ (or an equivalent) with the judicial exception (i.e., applying the judicial exception to the network security), or mere instructions to implement an abstract idea on a computer or generally linking the use of the judicial exception to a particular technological environment or field of use (i.e., the network security) is also found to not be enough to qualify as significantly more.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claim(s) 1-3, 5, 8, 10-13, 15, and 18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dherange et al. (Pub. No.: US 2020/0382536, hereinafter, “Dherange”) in view of Chen et al. (Patent No.: US 8,307,430, hereinafter, “Chen”).
Claims 1, 10, 11. Dherange teaches:
A method for detecting deviations from baseline behavior patterns for categorical features, comprising: determining a first discrete probability distribution for a categorical variable based on a first set of network activity data including at least one instance of the categorical variable; determining a second discrete probability distribution for a unique observation based on a second set of network activity data including data representing the unique observation; – in paragraphs [0039], [0043], [0044], [0081]-[0092], [0163], [0164] (A ML model may be used to create the baseline behavior of a user with parameter guidance from Feature Selection. During Prediction phase, the unseen data is classified by the ML model based on user profile coefficients generated during training and threshold parameters set in prior to an anomaly detection run. In general, the records may have entries that have attributes that are numerical or categorical. For example, a user's login session duration is a numerical attribute for a user's record, indicating time in hours, minutes, and seconds, that a user has spent during each logging event in the IT infrastructure. Other examples of numerical records include a number of bytes downloaded, a number of bytes uploaded, time of entry in a building, time of exit from a building, and so on. The method 1100 includes determining (1108), based on a normally distributed attribute value frequency measure, a second set of outliers in categorical values of the training portion of the plurality of records. The second set of outliers may be determined based on the steps described with reference to FIG. 6 and FIG. 7. For example, the second set of outliers may be determined by: calculating, for each attribute A_j, normalized frequencies of every attribute value across all entries in the categorical values of the training portion of the plurality of records, calculating a frequency score for each input record xi as an average across selected attributes, and tagging a record as an outlier when the frequency score is above a threshold describing a deviation from a baseline)
comparing the second discrete probability distribution to the first discrete probability distribution by applying a distance function to the first and second discrete probability distributions, – in paragraphs [0149]-[0158], [0163], [0164] (Anomalies that are based on distance/similarity deviation from the norm can then be identified by the prediction process. The method 1100 includes determining (1108), based on a normally distributed attribute value frequency measure, a second set of outliers in categorical values of the training portion of the plurality of records. The second set of outliers may be determined based on the steps described with reference to FIG. 6 and FIG. 7. For example, the second set of outliers may be determined by: calculating, for each attribute A_j, normalized frequencies of every attribute value across all entries in the categorical values of the training portion of the plurality of records, calculating a frequency score for each input record xi as an average across selected attributes, and tagging a record as an outlier when the frequency score is above a threshold describing a deviation from a baseline)
determining whether the scalar value is above a threshold; – in paragraphs [0152]-[0157] (During the training phase, the goal is to find an optimal threshold that is used as a scalar to establish the threshold for anomaly detection based on the formula. Anomaly thresholding score=min+threshold*std. The distance/similarity averages of unseen data are then compared against the threshold established in the training process. A linear transformation is applied to the anomalous distance/similarity averages that map the raw scores to a numerical range of [50, 100]. A lower threshold may give a better recall with the risk of more false positives. A higher threshold leads to a better precision but at the risk of losing more true positives instead.)
detecting an anomaly with respect to the categorical variable when the scalar value is above the threshold; and determining that a behavior with respect to the categorical variable is normal when the scalar value is not above the threshold. – in paragraphs [0150]-[0165] (During the training phase, the goal is to find an optimal threshold that is used as a scalar to establish the threshold for anomaly detection based on the formula. The method 1100 includes determining (1108), based on a normally distributed attribute value frequency measure, a second set of outliers in categorical values of the training portion of the plurality of records. The second set of outliers may be determined based on the steps described with reference to FIG. 6 and FIG. 7. For example, the second set of outliers may be determined by: calculating, for each attribute A_j, normalized frequencies of every attribute value across all entries in the categorical values of the training portion of the plurality of records, calculating a frequency score for each input record xi as an average across selected attributes, and tagging a record as an outlier when the frequency score is above a threshold describing a deviation from a baseline. The method 1100 includes detecting (1112) anomalies in the plurality of records by classifying the plurality of records using the first set of tags and the second set of tags with a probabilistic classifier.)

Dherange does not explicitly teach:
wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions.
However, Chen teaches:
wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions; – on lines 47-48 in column 2, on lines 23-27 in column 5 (KL-distance is a natural distance function from one probability distribution, to another probability distribution. KL-distance is a natural distance function from one probability distribution, to another probability distribution. It is referred to as relative entropy in information theory. For discrete probability distributions, p={p.sub.1, . . . , p.sub.m} and q={q.sub.1, . . . , q.sub.m}.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange with Chen to include wherein an output of the distance function is a scalar value representing a difference between the first and second discrete probability distributions, as taught by Chen, on lines 25-28 in column 3, to provide automated analysis of the flow characteristics, and monitor traffic activities passing through computer networks.

Claims 2, 12. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).
Dherange teaches:
further comprising: performing at least one mitigation when an anomaly is detected. – in paragraph [0021] (The disclosed technology provides implementations and examples of anomaly detection that may be used in implementations that detect and mitigate cyber threats.)

Claims 3, 13. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).
Dherange teaches:
wherein determining the first discrete probability distribution further comprises: determining a time window such that activity with respect to the categorical variable is assumed to be fully observed during the determined time window, wherein a duration of the time window Is based on a type of the categorical variable, wherein the first discrete probability distribution function is determined based on a portion of the first set of network activity data corresponding to the determined time window. – in paragraph [0049] (An optional re-training process (810) may be used. The re-training process may be started, for example, in response to making a decision that the current entity profile coefficients are producing too many false positives, or after passage of a certain amount of time (e.g., once every six months). In some cases, the re-training may be associated with a real-world event such as a re-organization of a company's departments, deployment of a new software platform or a new remote working policy in business organization, and so on. A portion of the records acquired at step 802 may be used as training portion and results of the anomalies detected by the training portion may be used to train the probabilistic classifier. For example, a human operator may check the anomaly detection performed on the training portion and may alter training parameters or the thresholds used for anomaly detection.)

Claims 5, 15. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).

Chen further teaches:
wherein the scalar value increases as the difference between the first and second discrete probability distributions increases. – on lines 47-48 in column 2, on lines 23-27 in column 5 (KL-distance is a natural distance function from one probability distribution, to another probability distribution. KL-distance is a natural distance function from one probability distribution, to another probability distribution. It is referred to as relative entropy in information theory. For discrete probability distributions, p={p.sub.1, . . . , p.sub.m} and q={q.sub.1, . . . , q.sub.m}.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange with Chen to include wherein the scalar value increases as the difference between the first and second discrete probability distributions increases, as taught by Chen, on lines 25-28 in column 3, to provide automated analysis of the flow characteristics, and monitor traffic activities passing through computer networks.

Claims 8, 18. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).

Chen further teaches:
wherein the distance function is any of: a cross-entropy distance function, and a chi-squared statistic function. – on lines 47-48 in column 2, on lines 23-27 in column 5 (KL-distance is a natural distance function from one probability distribution, to another probability distribution. KL-distance is a natural distance function from one probability distribution, to another probability distribution. It is referred to as relative entropy in information theory. For discrete probability distributions, p={p.sub.1, . . . , p.sub.m} and q={q.sub.1, . . . , q.sub.m}.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange with Chen to include wherein the distance function is any of: a cross-entropy distance function, and a chi-squared statistic function, as taught by Chen, on lines 25-28 in column 3, to provide automated analysis of the flow characteristics, and monitor traffic activities passing through computer networks.

Claim(s) 4, 6, 7, 9, 14, 16, 17, and 19 is/are rejected under 35 U.S.C. 103 as being unpatentable over Dherange et al. (Pub. No.: US 2020/0382536, hereinafter, “Dherange”) in view of Chen et al. (Patent No.: US 8,307,430, hereinafter, “Chen”), and further in view of Dean et al. (Pub. No.: US 2020/0280575, hereinafter, “Dean”).
Claims 4, 14. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).

Combination of Dherange and Chen does not explicitly teach:
wherein determining the first discrete probability distribution further comprises: determining a sub-population of devices and systems indicated in the first network activity data, wherein the sub-population of devices and systems has a common attribute, wherein the portion of the first set of network activity data corresponding to the determined time window is related to the sub-population of devices.
However, Dean teaches:
wherein determining the first discrete probability distribution further comprises: determining a sub-population of devices and systems indicated in the first network activity data, wherein the sub-population of devices and systems has a common attribute, wherein the portion of the first set of network activity data corresponding to the determined time window is related to the sub-population of devices. – in paragraphs [0057]-[0074], [0293]-[0296] (One may model the tail probabilities (1) separately for some devices. As well as this one may wish to group certain subsets of the network devices together and build a single model for the tail probabilities of the devices in the subset based on the union of the observations of the metric for each individual device in the group. The groups may be manually specified by a user, may be created by grouping all devices of a certain type e.g. all desktops on a subnet or may be determined algorithmically by applying a clustering algorithm to some feature set.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange and Chen with Dean to include wherein determining the first discrete probability distribution further comprises: determining a sub-population of devices and systems indicated in the first network activity data, wherein the sub-population of devices and systems has a common attribute, wherein the portion of the first set of network activity data corresponding to the determined time window is related to the sub-population of devices, as taught by Dean, in paragraph [0006], to provide a technique for detecting potentially malicious network activity and a technique for representing the output of anomaly detection algorithms to non-expert users.

Claims 6, 16. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s). 

Combination of Dherange and Chen does not explicitly teach:
wherein the threshold is associated with the categorical variable.
However, Dean teaches:
wherein the threshold is associated with the categorical variable. – in paragraphs [0057]-[0074], [0219]-[0223] (The behavioral metrics are computed from data sampled by the network traffic monitoring system. As described above there are two types of metric, network metrics and derived metrics. Given an historical sequence of observations of the values of the metrics M.sub.1, . . . , M.sub.n 1. for each i use a suitable POT fitting method to find a threshold u.sub.i. an estimate {circumflex over (P)}(M.sub.i>u.sub.i) of the tail probabilities (1). an estimate {circumflex over (ξ)}.sub.i,{circumflex over (σ)}.sub.i of the parameters of the GPD that describes the conditional distribution P(M.sub.i−u.sub.i|M.sub.i>u.sub.i).)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange and Chen with Dean to include wherein the threshold is associated with the categorical variable, as taught by Dean, in paragraph [0006], to provide a technique for detecting potentially malicious network activity and a technique for representing the output of anomaly detection algorithms to non-expert users.

Claims 7, 17. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s). 

Combination of Dherange and Chen does not explicitly teach:
wherein each discrete probability distribution indicates a probability of each of a plurality of potential categories for the categorical variable.
However, Dean teaches:
wherein each discrete probability distribution indicates a probability of each of a plurality of potential categories for the categorical variable. – in paragraphs [0016]-[0022], [0042], [0051], [0057]-[0074], [0083]-[0086], [0115] (In the next stage, historical data 150 of each individual behavior metric may be analyzed. A mathematical model of what is considered to be normal behavior 170 for that metric is constructed for each device on the network from historical data. The third stage of the system comprises receiving new observations 160 and analyzing the new observations of the activity of each network device in real time 180 to detect anomalous behavior. This is done by first processing the new network activity measurement of each device to compute the values of the corresponding behavioral metrics. These values are then analyzed to see how anomalous the value of each metric is.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange and Chen with Dean to include wherein each discrete probability distribution indicates a probability of each of a plurality of potential categories for the categorical variable, as taught by Dean, in paragraph [0006], to provide a technique for detecting potentially malicious network activity and a technique for representing the output of anomaly detection algorithms to non-expert users.

Claims 9, 19. Combination of Dherange and Chen teaches The method of claim 1 – refer to the indicated claim for reference(s).

Combination of Dherange and Chen does not explicitly teach:
wherein the categorical variable is any of: a host, a communication channel, and a port.
However, Dean teaches:
wherein the categorical variable is any of: a host, a communication channel, and a port. – in paragraphs [0057]-[0074] (Number of attempts made by a device to connect to closed ports on other devices in a given time interval.)
It would have been obvious for one of ordinary skill in the art, before the effective filing date of the claimed invention, to modify Dherange and Chen with Dean to include wherein the categorical variable is any of: a host, a communication channel, and a port, as taught by Dean, in paragraph [0006], to provide a technique for detecting potentially malicious network activity and a technique for representing the output of anomaly detection algorithms to non-expert users.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUHAMMAD RAZA whose telephone number is (571)272-7734. The examiner can normally be reached Monday-Friday, 7:00 A.M.-5:00 P.M..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Vivek Srivastava can be reached on (571)272-7304. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MUHAMMAD RAZA/Primary Examiner, Art Unit 2449