DETAILED ACTION
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 to 2, 5, 11 to 13, and 16 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al. (U.S. Patent No. 11,003,773) in view of Anderson et al. (U.S. Patent Publication 2018/0189677).
Concerning independent claims 1 and 11 to 12, Fang et al. discloses a method, system, and computer program product for automatically generating malware detection rule recommendations, comprising:
“receiving communication data, comprising a formatted data unit, that is being communicated between two machines using a communication protocol” – a term ‘object’ generally relates to information having a logical structure or organization for malware analysis, where the information may include an executable, e.g., an application program, code segment, a script, or any file in a format (“comprising a formatted data unit”) that can be directly executed by a computer, e.g., a file with an ‘.exe’ extension, or a non-executable file, e.g., a file, a document of Portable Document Format (‘PDF’), a word processing document of Word®, an electronic mail ‘email’ message, or simply a collection of related data, e.g., packets (column 6, lines 54 to 64); cybersecurity protection system 100 is deployed for detecting and protecting a local network of a customer against cyberattacks, and is configured to analyze incoming objects for malware; analysis may include processing, e.g., executing the objects, monitoring for selected events, and capturing meta-information associated with any of these monitored events; cybersecurity systems 1101 - 110N are deployed as a network device communicatively coupled to receive and analyze objects with network traffic (“receiving communication data”), e.g., objects of incoming network traffic, objects propagating in network traffic over a local network 130, and analyze objects for malware (column 8, lines 5 to 39: Figure 1); here, objects of network traffic are “being communicated between two machines”; implicitly, network traffic is communicated “using a communication protocol”, e.g., TCP/IP or HTTP;
“extracting an information element from the formatted data unit, wherein the information element comprises an n-gram from the formatted data unit” – a cybersecurity system is configured to conduct analysis of objects to determine whether any of the objects may be associated with malware; a receiver includes a parser and feature extraction logic; a ‘feature’ may be categorized as repetitive patterns within text of meta-information discovered using a sliding window, e.g., N-Grams, Skip-Gram, etc., which may lead to rule candidate patterns (column 3, lines 4 to 35); feature extraction logic 410 extracts features 430 from formatted meta-information 405, where features 430 may be categorized as repetitive patterns within text of the formatted meta-information 405 discovered using a sliding window, e.g., N-Gram, Skip-Gram, etc. (“wherein the information element comprises an n-gram from the formatted data unit”) (column 14, lines 5 to 15: Figure 4);
“analyzing the information element [by comparing the n-gram to a dictionary of similar information elements], and then determining a probability [that the information element corresponds to a similar information element in the dictionary]” – machine learning models are applied to each of the plurality of features to generate a score that represents a level of maliciousness for the feature and thereby a degree of usefulness of the feature in classifying the object as malicious or benign (Abstract); a score assigned to a feature may be adjusted based on a selected weighting scheme, e.g., increase scores for features with a higher probability of being associated with malware and/or decrease scores for features with a lesser probability of being associated with malware (column 4, lines 19 to 30); each provisional malware detection rule 190 may be tested against a searchable data store 220 including meta-information associated with known malware and/or known goodware to determine a suitability of a provisional malware detection rule 190 (column 13, lines 62 to 67: Figure 2A); here, a score represents “a probability” that an object represents malware because a higher  probability implies being associated with malware and a lower probability implies not being associated with malware;
“establishing, based on the probability, a rule governing communication between the two machines” – rule recommendation is generated in a creation of malware rules; a second plurality of features is selected as salient features that are used in creation of malware detection rules (Abstract); for a group of features having a ML prediction score that surpasses a selected threshold, salient features form the basis for rule recommendation provided to the analytic system (column 4, lines 44 to 51); once rule recommendations are finalized at analytic system 170, provisional malware detection rules 190 are generated from finalized rule recommendations (column 10, lines 46 to 48: Figure 1); here, generating a malware detection rule based on a score for a feature being above a threshold is “establishing, based on the probability, a rule governing communication between the two machines”; that is, malware is a “communication between machines”, and a rule obtained from a score is “establishing, based on the probability, a rule”;
“applying the rule for anomaly detection during monitored communication between the two machines” – a term ‘malware’ may prompt or cause unauthorized, anomalous, unintended, and/or unwanted behaviors or operations constituting a security compromise of information infrastructure; anomalous behavior may include a communication-based anomaly or an execution-based anomaly (“for anomaly detection”) (column 7, lines 10 to 14); once rule recommendations are finalized, analytic system 170 transmits the provisional malware detection rules 190 via network 195 to one or more of cybersecurity systems 1101 - 110N for initial testing and generally verification (column 10, lines 46 to 52: Figure 1); provisional malware detection rules 190 may be finalized and provided to cybersecurity system 110 for use in malware detection and/or remediation of any uncovered malware (column 12, lines 8 to 14: Figure 2A).
Concerning independent claims 1 and 11 to 12, Fang et al. clearly discloses all of the elements of these independent claims omitting at most the limitations directed to “comparing the n-gram to a dictionary of similar information elements” and determining a probability “that the information element corresponds to a similar element in the dictionary”.  That is, Fang et al. does not clearly disclose that n-grams are compared to n-grams in a dictionary to determine features for generation of rules to detect malware.  Still, Fang et al. similarly discloses that analytic system 170 uses a searchable data store 220 including meta-information associated with known malware and/or known goodware to determine suitability of a provisional malware detection rule 190.  (Column 11, Lines 62 to 67: Figure 2A)  A searchable data store 220 of labeled events associated with known malware and known goodware are provided.  (Column 13, Line 64 to Column 14, Line 2: Figure 4)  Rule recommendation verification may be accomplished by conducting a query search 460 to evaluate each salient feature with features associated with known malware objects and/or benign objects maintained by data store 220.  (Column 15, Lines 58 to 65: Figure 4)  Here, data store 220 is equivalent to “a dictionary of similar information elements” of malware objects and/or benign objects.  Fang et al., then, arguably discloses the limitation of “analyzing the information element by comparing the n-gram to a dictionary of similar information elements, and then determining a probability that the information element corresponds to a similar information element in the dictionary” because features include n-grams.  
Concerning independent claims 1 and 11 to 12, even if these limitations of “comparing . . .  to a dictionary of similar information elements” and determining a probability “that the information element corresponds to a similar element in the dictionary” are omitted by Fang et al., they are taught by Anderson et al.  Generally, Anderson et al. teaches training a machine learning-based traffic analyzer, where a determination is whether a generated feature vector is already represented in a training dataset dictionary by one or more feature vectors in the dictionary.  (Abstract)  Data packets 140, e.g., traffic/messages, may be exchanged among nodes/devices of computer network 100 over links using predefined network communication protocols, e.g., Transmission Control Protocol/Internet Protocol (TCP/IP), Unigram Transfer Protocol (UDP), Asynchronous Transfer Mode (ATM), etc. (“using a communication protocol”).  (¶[0014]: Figure 1A)  Techniques compute a similarity score between a newly observed feature vector and those already stored in a training dataset dictionary to determine whether or not to add the new feature vector to the dictionary.  (¶[0045])  Training dataset generator process 248 may execute a similarity analyzer 408 to determine whether observed feature vector 406 is already represented by one or more of feature vectors 412 in training dataset dictionary 410.  Similarity analyzer 408 may compute one or more similarity scores 416 that represent how similar observed feature vector 406 is to any of the feature vectors 412 in training dataset dictionary 410.  (¶[0052]: Figure 4)  Anderson et al., then, teaches “comparing . . . to a dictionary of similar information elements”, and then determining a score, or “probability”, “that the information element corresponds to a similar information element in the dictionary”.  Anderson et al.’s feature vectors correspond to N-Gram or Skip-Gram features of Fang et al.  An objective is to optimize network performance by determining malicious network traffic that may use encryption and may protect payload of traffic from inspection.  (¶[0003] - ¶[0004])  It would have been obvious to one having ordinary skill in the art to compare features to a dictionary of similar features as taught by Anderson et al. to establish rules based on n-grams in Fang et al. for a purpose of optimizing network performance to detect malicious network traffic.
Concerning claims 2 and 13, Fang et al. discloses that anomalous behaviors may include a communication-based anomaly or execution-based anomaly which could alter the functionality of a network device executing an application (“a communication destined for an application on at least one of the two machines”) (column 7, lines 26 to 34); implicitly, a cybersecurity system that detects malware is “intercepting and processing a communication”.
Concerning claims 5 and 16, Fang et al. discloses that received meta-information may be obtained from a log that maintains detected events.  (Column 2, Lines 62 to 65).  ‘Features’ of repetitive patterns within text of meta-information are discovered by using a sliding window, e.g., N-Gram or Skip-Gram.  (Column 3, Lines 28 to 35; Column 14, Lines 5 to 15: Figure 4)  Collected meta-information may be obtained from a log including a behavior log, an endpoint dynamic behavior monitor log, or a static portable execution file.  (Column 8, Lines 44 to 48)  Fang et al., then, discloses “constructing the n-gram from the communication data”.  Broadly, Fang et al. discloses “constructing the n-gram from at least one of content information” in a limitation of “constructing the n-gram from at least one of content information or context information”.  That is, “content information” can simply be construed as a content of information in a log or a content of received meta-information that is used to generate an n-gram.  

Claims 3 to 4 and 14 to 15 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al. (U.S. Patent No. 11,003,773) in view of Anderson et al. (U.S. Patent Publication 2018/0189677) as applied to claims 1 and 12 above, and further in view of Andress et al. (U.S. Patent Publication 2007/0174469).
Fang et al. discloses “intercepting . . . a communication between two machines”, but omits “duplicating an original communication of a data stream between the two machines to form a duplicate communication”, “processing the duplicate communication”, and “returning the original communication to the data stream” of claims 3 and 14.  Similarly, Fang et al. discloses that received meta-information may be obtained from a log that maintains detected events based on operations performed by malware detection rules (“wherein receiving communication data includes receiving a communication data log, containing records of at least one communication”), but omits receiving communication data “from a plug-in” of claims 4 and 15.  
However, Andress et al. teaches intercepting communications between a client and a service, where a proxy invokes an interceptor plug-in that is plugged into the proxy.  (Abstract)  One prior art embodiment for intercepting IP data traffic is to log all IP datagrams of several user sessions at specific interception points (“receiving a communication data log”), and doing filtering analysis in order to regenerate a complete user session.  (¶[0006])  An incoming or outgoing call for a certain telephone number is intercepted at a switch, and the switch is duplicating the communication content (“duplicating an original communication of a data stream between two machines to form a duplicate communication”).  The transmission between caller and callee is transferred to a law enforcement agency via a mediation device.  (¶[0007])  A request and response are stored on a message queue, where a message queue is an interceptor plug-in, or wherein the request and the response are stored on the interceptor plug-in.  The request and response are transferred from the message queue or from the interceptor plug-in to an interceptor manager (“receiving a communication log, containing records of at least one communication, from a plug-in”) by an encrypted end-to-end communication.  (¶[0037])  Here, Andress et al., then, teaches these limitations of “duplicating an original communication” and “a plug-in” that stores information of “a communication log data”.  Implicitly, if a message is duplicated, then an original message continues to a recipient, which is equivalent to “returning the original communication to the data stream.”  An objective is to enable interception of a customer’s communication for law enforcement agencies, and to provide an improved method for intercepting data traffic.  (¶[0002] and ¶[0011])  It would have been obvious to one having ordinary skill in the art to provide a plug-in for interception of logged data and to duplicate a communication as taught by Andress et al. to determine a score for malware events in traffic data according of Fang et al. for a purpose of providing an improved method for intercepting data traffic for law enforcement agencies.

Claims 6 to 10 and 17 to 21 are rejected under 35 U.S.C. 103 as being unpatentable over Fang et al. (U.S. Patent No. 11,003,773) in view of Anderson et al. (U.S. Patent Publication 2018/0189677) as applied to claims 1 and 12 above, and further in view of Li et al. (U.S. Patent No. 10,754,948).
Concerning claims 6 to 7, 9 to 10, 17 to 18, and 20 to 21, Fang et al. generally discloses N-Grams and Skip-Grams, but does not describe these specific species of n-grams comprising “a unigram”, “a bigram”, “a higher-order n-gram”, and “at least one of unigrams, bigrams, or higher-order n-grams.”  Still, n-grams are well known in natural language processing, where the most common n-grams are unigrams, bigrams, and trigrams, but higher order n-grams, e.g., 5-grams, are known, too.  Li et al. teaches protecting devices from malicious files based on n-gram processing of sequential data.  (Abstract)  Operations can include generating n-grams of discrete tokens.  (Column 2, Lines 34 to 35)  Files that are malicious may be likely to include certain tokens and/or certain sequences of tokens, and a vector of weights can reflect a frequency of tokens and/or sequences of tokens and the likelihood that a file including those tokens and/or sequences of tokens is malicious.  (Column 3, Line 59 to Column 4, Line 15)  Specifically, ‘n-grams’ are a plurality of sequences of tokens, where a value of n can be any suitable number.  ML pack 134 can be configured to generate 1-grams (unigrams), ML pack 134 can be configured to generate 2-grams (bigrams), and ML pack 134 may be configured to generate 3-grams (trigrams).  (Column 5, Line 23 to Column 6, Line 23: Figure 1)  Figure 4 illustrates a plot of n-grams for n=1 to n=8, and Figure 5 is a plot for n=4.  (Column 12, Lines 55 to 58: Figures 4 to 5)  Li et al., then, teaches n-grams that include “a unigram”, “a bigram” and “a higher-order n-gram”.  An objective is to detect malicious files without requiring an exact match with an a priori known malicious file, and to facilitate highly accurate detection of malicious files that consumes relatively few processing resources to reduce computational impact.  (Column 3, Lines 11 to 20)  It would have been obvious to one having ordinary skill in the art to detect malware in Fang et al. using unigrams, bigrams, and higher-order n-grams as taught by Li et al. for a purpose of detecting malicious files with high accuracy, low computational impact, and without requiring an exact match with known malicious files.  
Concerning claims 8 and 19, Li et al. teaches generating bigrams.  (Column 5, Lines 42 to 65)  Anderson et al. teaches “comparing” feature vectors “with a repository of” feature vectors “to determine a second probability of a sequence of elements contained within” the feature vector, based on patterns observed in analyzing the repository”.  Here, Anderson et al. compares feature vectors with feature vectors in a dictionary (“a repository”).  Anderson et al.’s feature vectors correspond to features that can be N-Grams that are associated with malware and/or goodware in data store 220 of Fang et al.

Response to Arguments
Applicants’ arguments filed 29 April 2022 have been considered but are moot in view of new grounds of rejection as necessitated by amendment.
Generally, Applicants have provided extensive amendments to independent claims 1 and 11 to 12, and have rewritten these independent claims to set forth significantly new limitations.  Then Applicants traverse the prior rejections for an improper written description under 35 U.S.C. §112(a) and for obviousness of the independent claims under 35 U.S.C. §103 over Mekky et al. (U.S. Patent No. 10,038,706) and Song et al. (U.S. Patent Publication 2009/0254501).  Applicants’ argument is that neither reference of the prior art discloses the use of a dictionary, library, or the like, for comparing an n-gram to a dictionary of similar information elements, and determining a probability that the information element corresponds to similar information in the dictionary.  Additionally, Applicants argue that a corpus of words appears to be training data for machine learning in Song et al.
Applicants’ amendment overcomes the rejection under 35 U.S.C. §112(a).
New grounds of rejection are necessitated by the amendments to independent claims 1 and 11 to 12 as being obvious under 35 U.S.C. §103 over Fang et al. (U.S. Patent No. 11,003,773) in view of Anderson et al. (U.S. Patent Publication 2018/0189677).  Anderson et al., at least, is maintained to clearly teach the new limitations argued by Applicants of comparing information elements to a dictionary of similar information and determining a score, or ‘probability’, that an information element corresponds to a similar information element in a dictionary.  The rejection of dependent claims 3 to 4 and 14 to 15 continues to rely upon Andress et al. (U.S. Patent Publication 2007/0174469).  New grounds of rejection are applied to dependent claims 6 to 10 and 17 to 21 as being obvious under 35 U.S.C. §103 further in view of Li et al. (U.S. Patent No. 10,754,948).  All of these new grounds of rejection are necessitated by amendment.
Generally, Applicants’ arguments are moot in view of these new grounds of rejection.  Applicants do suggest an argument against Song et al., where this reference is directed to using training data for machine learning.  Admittedly, Fang et al. and Anderson et al. are using machine learning, too.  Now, Applicants’ Specification does not expressly mention using machine learning, but is directed to generating rules.  However, Fang et al. is using machine learning to generate rules from n-grams, and Applicants’ invention is generically directed to generating rules from n-grams even if machine learning is not expressly described.  The absence of an express mention of machine learning in the Specification, then, cannot be maintained to be a patentably unobvious feature as machine learning to generate rules for malware detection from n-grams in Fang et al. at least comprises a broader embodiment of generating rules for malware detection from n-grams according to the Specification.
Moreover, Fang et al. is maintained to disclose all of the limitations of the independent claims with an arguable exception of “comparing the n-gram to a dictionary of similar information elements and then determining a probability of the information element corresponds to a similar information element in a dictionary”.  Still, Fang et al. discloses a searchable data store 220 that appears equivalent to “a dictionary”.  Anyway, Anderson et al. expressly teaches determining if a feature vector is already represented in a training dataset dictionary according to a similarity score.  This similarity score is equivalent to a probability of similarity of an information element in a dictionary.  Fang et al. and Anderson et al., in combination, are maintained to teach all of the limitations of the independent claims.
This Office Action is NON-FINAL.

Conclusion
The prior art made of record and not relied upon is considered pertinent to Applicants’ disclosure.
Chesla et al., Yang et al., Chen, Wu et al., and Medalion et al. disclose related prior art.
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MARTIN LERNER whose telephone number is (571) 272-7608.  The examiner can normally be reached Monday-Thursday 8:30 AM-6:00 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Daniel Washburn can be reached on (571) 272-5551.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center.  Unpublished application information in Patent Center is available to registered users.  To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov.  Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format.  For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).  If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MARTIN LERNER/Primary Examiner
Art Unit 2657                                                                                                                                                                                                        June 1, 2022