Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . The present Office Action is responsive to communications received 1/7/2020. Claims 1-20 are pending.

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 12/16/2019 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Objection
Claim 20 is objected to for informalities, the claim recite “the refine set” instead of the “refined set”. Correction is kindly requested.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


Claims 1, 3-15 and 17-20 are rejected under 35 USC 1010 because the claimed invention is directed to an abstract idea without significantly more. Claism 1, 11 and 15 recite: “clustering a first plurality of tokens ... refining the first plurality of tokens ..., generating a network-level application signature ...”, which are  a series of mental or mathematical steps (step 2A, prong 1). 

The claims recite additional elements such as non-transitory machine readable medium executable by a computing device (claim 11),  a processor, a computer-readable medium (claim 15) that amount no more than mere instructions to apply the abstract idea using a generic computer component. Hence, the judicial exception is not integrated into a practical application (step 2A, prong 2). 
Furthermore, the claims do not recite any element or combination of elements that amount to significantly more than the abstract idea (step 2B) for the reason explained above (i.e. applying the abstract idea using a generic computer component).  In view of the combination of elements, the claims are drawn to an abstract idea (mathematical concepts or mental process) without significantly more.
Therefore, claims 1, 11, and 15 are not eligible for patent.
Claims 3-10, 12-14, 17-20 respectively dependent from claims 1, 11 and 15 are also ineligible because they merely further recite aspects of the abstract idea (i.e. mathematical or mental steps).  Hence, they do not integrate the abstract idea into a practical application, and the claims do not amount to significantly more than the abstract idea.
 

Additionally claims 15-20 are rejected under 35 USC 101 because the claims are drawn to an apparatus comprising a processor and a computer readable medium. The broadest reasonable interpretation of a processor comprises software and the computer readable medium include signals, according to the specifications ([0067]). For the 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 6 and 20 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claims 6 and 20 recite “the tokens”, which lack antecedent basis and renders the claim indefinite. For examination purposes, the limitation will be considered as the token vectors. Clarification is kindly requested.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1-2, 6-12, 14-16 and 20 are rejected under 35 U.S.C. 103 as being unpatentable over US 20200036734 to Radja et al., hereinafter Radju, in view of “Behavioral Clustering of HTTP Based Malware and Signature Generation Using Malicious Network Traces”, by R. Perdisci et al, 2010, p.1-14, hereinafter Perdisci. Perdisci is cited in IDS dated 12/16/2019.

Regarding claim 1, Radja discloses 
A method comprising: clustering a first plurality of token weight vectors to generate a first plurality of clusters of token weight vectors, wherein the first plurality of token weight vectors were generated based, at least in part, on a first plurality of token vectors generated from captured network traffic of a session ([0022][0023]: extract parameters (token vectors) of data traffic from data sources (applications) , assign importance weight to the parameters, group the weighted parameters into clusters); refining the first plurality of clusters to a refined set of one or more clusters ([0023]: build a decision tree and create an optimal number of clusters, using for instance a k-means algorithm); and generating a network-level application signature for the target application ([0045][0051]: generate for each cluster count and statistics (signature) used to filter traffic ).
Radja does not explicitly teach the refining as claimed i.e wherein refining the plurality of clusters comprises iteratively reducing the clusters based on similarity of corresponding ones of the first plurality of token vectors to a set of one or more application tokens for a target application and re-clustering the remaining ones of the token weight vectors, until a termination criterion is satisfied; and generating a network-level application signature for the target application based, at least in part, on those of the first plurality of token vectors that correspond to the refined set of one or more clusters of the token weight vectors remaining after the termination criterion is satisfied.  
In an analogous art, Perdisci discloses a multi-step cluster refinement process to produce malware signature; Perdisci teaches refining the plurality of clusters comprises iteratively reducing the clusters based on similarity of corresponding ones of the first plurality of token vectors to a set of one or more application tokens for a target application and re-clustering the remaining ones of the token weight vectors, until a termination criterion is satisfied; and generating a network-level application signature for the target application based, at least in part, on those of the first plurality of token vectors that correspond to the refined set of one or more clusters of the token weight vectors remaining after the termination criterion is satisfied (p. 3, Fig.1: iterate in 3 steps from coarse-grained clustering to fine-grained cluster, to merge clusters, the two last steps determining similar statistical patterns between different http traffic , merging together clusters that have similar enough http behavior, and define network signatures that summarize the traffic generated by the malware). It would have been obvious to a skilled artisan before the application was effectively filed to apply the cluster refinement process of Perdisci to the clusters created by Radja and achieve the claim because it would allow “to obtain more generic network-level malware signatures, increasing the malware detection rate” (Perdisci, second paragraph on right).

Regarding claim 2, Radja in view of Perdisci discloses the method of claim 1, further comprising filtering monitored network traffic based, at least in part, on the network-level application signature of the target application (Radja, [0045][0051], Perdisci p.5, under 4, on right: deploy signatures into IDS).  

Regarding claim 6, Radja  in view of Perdisci discloses the method of claim 1, wherein generating the network-level application signature for the target application based, at least in part, on those of the first plurality of token vectors that correspond to the refined set of one or more clusters of token weight vectors comprises determining context information for the tokens indicated in those of the first plurality of token vectors that correspond to the refined set of one or more clusters of token weight vectors and generating program code that describes a pattern formed by the tokens across packets of the captured network traffic according to the context information (Perdisci p.5 in Cluster Centroids, on left: invariant tokens common to all request in the pool, such as the first part of the http requests, namely, the request method and URL, which defines the context information).  
Regarding claim 7, Radja  in view of Perdisci discloses the method of claim 1, further comprising generating the first plurality of token weight vectors based, at least in part, on the first plurality of token vectors, wherein each of the first plurality of token weight vectors comprises a token weight for each token in the corresponding one of the first plurality of token vectors (Radja [0030][0034]: weight assigned to each parameter).  
Regarding claim 8, Radja  in view of Perdisci discloses the method of claim 1 further comprising: clustering a second plurality of token weight vectors to generate a second plurality of clusters of token weight vectors, wherein the second plurality of token weight vectors were generated based, at least in part, on a second plurality of token vectors generated from captured network traffic of a second session; refining the second plurality of clusters to a second refined set of one or more clusters, wherein refining the second plurality of clusters comprises iteratively reducing the clusters based on similarity of corresponding ones of the second plurality of token vectors to the set of one or more application tokens for the target application and re-clustering the remaining ones of the token weight vectors, until the termination criterion is satisfied; and generating a second network-level application signature for the target application based, at least in part, on those of the second plurality of token vectors that correspond to the second refined set of one or more clusters of the token weight vectors remaining after the termination criterion is satisfied (see rejection of claim 1, processing the second session as the first session in claim 1).  
Regarding claim 9, Radja  in view of Perdisci discloses the method of claim 8, further comprising tokenizing packets of the captured network traffic to generate the first plurality of token vectors (Radja Fig. 4. 404: extract parameters or token from traffic session).
Regarding claim 10, Radja  in view of Perdisci discloses the method of claim 9, wherein tokenizing the packets comprises accessing a token dictionary defined for application network traffic (Radja [0021]: current parameters or tokens compared to baseline parameters from historical data, interpreted as a token dictionary).
Regarding claims 11 and 15, the claims recite substantially the same content as claim 1 and are rejected using the rationales rejecting claim 1.
Regarding claims 12 and 20, the claims recite substantially the same content as claim 6 and are rejected using the rationales rejecting claim 6.
Regarding claim 14, Radja  in view of Perdisci discloses the machine-readable medium of claim 11, further comprising instructions executable by the computing device to perform operations comprising filtering monitored network traffic based, at least in part, on detecting the pattern across packets of an active session in the network traffic (Radja, [0024][0045]: filter packets to take mitigation action such as blocking traffic [0059]; Perdisci p5. Deploy signature in IDS to detect malicious http requests).
Regarding claim 16, the claim recites substantially the same content as claim 2 and is rejected using the rationales rejecting claim 2.

Claims 5, 13 and 19 are rejected under 35 USC 103 as being unpatenttable over Radja and Perdisci, in view of US 20110202528 to Deolalikar et al., hereinafter Deolalikar.
Regarding claim 5, Radja  in view of Perdisci discloses the method of claim 1. Radja  in view of Perdisci  does not discloses yet Deolalikar teaches identifying tokens in a document ([0017]), applying weighing factor to each token ([0018]),  wherein  the first plurality of token weight vectors is determined based on a term frequency inverse document frequency statistic for token vectors in the first plurality of token vectors ([0018]).   It would have been obvious to a skilled artisan before the application was effectively filed to determine the weights based on a term frequency inverse document 
Regarding claims 13 and  19, the claims recite substantially the same content as claim 5 and are rejected using the rationales rejecting claim 5.

Allowable subject matter
Regarding claim 3 and substantially claim 17, Radja in view of Perdisci discloses claim 1 or 15, wherein iteratively reducing the first plurality of clusters based on similarity of corresponding ones of the first plurality of token vectors to the set of application tokens for the target application comprises: determining, for each of the first plurality of clusters remaining in an iteration, a proximity score based, at least in part, on similarity of the token vectors that correspond to centers of the cluster (Perdisci p.4, under 3.4); 
 Radja alone or in view of Perdisci or any other prior art of the record fails to teach: discarding n of the remaining clusters in the iteration with the n lowest proximity scores.  
Therefore, claims 3 and 17 are allowable over the prior art of record.
Regarding claim 4 and substantially claim 18, Radja in view of Perdisci discloses claim 1 or 15, wherein iteratively reducing the first plurality of clusters based on similarity of corresponding ones of the first plurality of token vectors to the set of application tokens for the target application comprises: determining, for each of the first plurality of clusters remaining in an iteration, a proximity score based, at least in part, on 
Radja alone or in view of Perdisci or any other prior art of the record fails to teach: discarding those of the remaining clusters in the iteration with a proximity score below a proximity score threshold.   
Therefore, claims 4 and 18 are allowable over the prior art of record.
Claims 3-4 and 17-18 are being objected to as being dependent upon a rejected base claim, but would be allowable if rewritten in independent form including all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure:
Shinomiya et al 20200404009 disclose extracting data from network traffic data, cluster the extracted data, analyze frequently appearing pattern, output signature that satisfies predetermined condition.
Bellala et al 20170053214 disclose generating a signature of application associated with network flows; clustering tokens, reducing the cluster size.
Vines et al 10623429 : Systems and methods are disclosed herein for generating a signature of an event by parsing network traffic, calculating entropy values, and generature a signature based on the entropy.
Liao et al 10332005 disclose:  classify network traffic based on application signature generated during training phase, based on tokens extracted from training set;
 Franc et al 20170316342 disclose extracting feature vectors from network traffic, feed the vectors in a learning classifier that calculates a distance to clusters to reflected degree of abnormality.
Nucci et al 9094288 disclose selecting a flow in network traffic, generate a new signature added to library;
Keralapura et al 8694630 : obtaining network flow, dividing the flow into tokens, generate a filtered cluster by iteractions of a pre-determined algorithm.
Iyer et al 20200175041 disclose a document clustering device, dividing a document in chunks, apply parameters to chunks , compute weight for the parameters;
G. Gu, R. Perdisci, J. Zhang, and W. Lee, “Botminer: Clustering analysis of network traffic for protocol- and structure-independent botnet detection,” in Proc. of 17th USENIX Security Symposium (USENIX Security ’08), June 2008, pp. 139–154: multi step clustering.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to CATHERINE B THIAW whose telephone number is (571)270-1138. The examiner can normally be reached Monday-Friday 7am-4pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.

Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/Catherine Thiaw/Primary Examiner, Art Unit 2493                                                                                                                                                                                                        1/14/2022