Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
DETAILED ACTION
 					Response to Amendment
This Action is responsive to the Applicant’s Amendment filed 12/17/2020.  In the Amendment, Applicant amended claims 1-20.  As necessitated by the Amendment, Examiner hereby respectfully withdraws 35 U.S.C § 112 second rejections to claims 1-20.    
After a thorough search and examination of the present application, and in light of the following:
Prior art made of record;
An updated search on prior art conducted in domains (EAST, NPL-ACM, Google, etc.);
Claims 1-2, 4, 6-16, 18 and 20 (renumbered 1-16) are allowed.

Examiner’s Amendment
An examiner’s amendment to the record appears below.  Should the changes and/ or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later the payment of the issue fee.	
Authorization for this examiner’s amendment was given in a telephone interview with attorney Ms. Kim Thien Bui (client’s representative, Reg. No. 76,843) at the telephone number (617) 301-2186 on 03/04/2021 with regards to the claims’ formality on was given for an Examiner Amendment.

The application has been amended as follows:
	In the claims:
Claims 3, 5, 17 and 19 are canceled.
Claims 1, 4 and 16 have been amended as follows:

(Currently Amended)  A method for generating regular expressions (regexes) as extraction patterns used by a computing system to extract specific information, comprising the steps of:
receiving log events, a set of seed words, and a set of seed patterns;
determining whether said set of seed words is full by counting whether the elements contained in said set of seed words reaches a first predetermined positive integer number;
if said set of seed words is not full, iteratively generating new patterns, selecting a subset of patterns based on pattern scores S1 from said new patterns, and adding said subset of patterns into said set of seed patterns;
using word scores S2
determining whether said subset of seed patterns is empty by counting whether the number of elements in said subset of seed patterns is zero;
if said subset of seed patterns is not empty, then applying said subset of seed patterns into said log events to extract a new set of words, selecting best words from said new subset of words based on said word scores S2, and adding said best subset of words into said set of seed words; and
repeating said steps (b) to (f) until said set of seed words is full or said subset of seed words is zero, then pruning said set of seed patterns[[.]],
wherein said subset of seed patterns is determined to be empty when said words scores S2 of said seed pattern elements are all below a preset score, and 
wherein said step (f) of determining whether said set of seed patterns is full further comprises determining whether said subset of said seed patterns contains all irrelevant seed elements whose word scores S2 are greater than a predetermined score.

2.  	(Previously Presented)  The method of claim 1 wherein said first predetermined positive integer number is 50.  

3.  	(Canceled)  

4.	(Currently Amended)  The method of claim 1 wherein said pattern scores S1 is defined as S1 = R*log2F wherein R equals to F divided by N (R = F/N), and wherein F is a number of common seed words between a previous set of seed words and a newly 
said word scores S2 are defined as
    PNG
    media_image1.png
    49
    208
    media_image1.png
    Greyscale
 wherein Fi is a frequency of an ith pattern and P is the number of patterns that produces said words.	 
5.   	(Canceled)  

6.	(Previously Presented)  The method of claim 1 wherein said step (c) of generating a new set of patterns further comprises:
	selecting a seed word element from said set of seed words; and
	selecting a log event that contains said seed word.

7.	(Previously Presented)  The method of claim 6 further comprising:
	(i)	dividing said log event into a prefix section, a post-prefix section, a keyword section, and a suffix section;
(j)	generating different patterns from said prefix section, said post prefix section, said keyword section, and said suffix section; and
(k)	concatenating said different patterns into said new set of patterns.
		 
8. 	(Previously Presented)  The method of claim 7 wherein said step (i) of dividing said log event further comprises providing a start of string for said set of patterns.

wherein said step (i) of dividing said log event further comprises providing a greedy quantifier with a negated character class.

10.	(Previously Presented)  The method of claim 9 wherein said step (i) of dividing said log event further comprises detecting said keyword.

11.	(Previously Presented)  The method of claim 10 wherein said step (i) of dividing said log event further comprises a tag detection.

12.	(Previously Presented)  The method of claim 6 further comprising:
 repeating said step of selecting a seed word element from said set of seed words and said step of selecting a log event that contains said seed word until a last seed word and a
 log event pair.  	

13.	(Previously Presented)  The method of claim 1 wherein said step (d) of selecting a subset of seed patterns from said set of seed patterns further comprises:
 providing a score to each seed pattern element in said set of seed patterns, wherein said scores S of each seed pattern element is proportional to a number of said seed words extracted by said set of seed patterns; and
selecting said seed pattern elements whose scores are higher than a predetermined score.

14.	(Previously Presented)  The method of claim 1 wherein said step (d) of generating a new set of seed patterns further comprises:
	partitioning said set of seed patterns into an old set of seed pattern and a new set of seed patterns;
	providing a score to every seed pattern element from said old set of seed pattern and said new set of seed patterns;
	sorting said old set of seed patterns and said new set of seed patterns in a descending order according to said scores into a first sorted list of old seed patterns and a second sorted list of new seed patterns;
	concatenating said first sorted list of old seed patterns and said second sorted list of new seed patterns into a combined list of sorted seed patterns; and 
	finding a number N where the first N patterns of said combined list of sorted seed patterns have a highest of said score and extracting at least a new seed word from said N patterns.

15.	(Previously Presented) The method of claim 1 wherein said step (g) of pruning said set of seed patterns further comprises:
	providing a score to every seed pattern element from said new set of seed patterns;
	sorting said new set of seed patterns in a descending order according to said scores into a sorted list of new seed patterns;
	concatenating said sorted list of new seed patterns into a combined list of sorted seed patterns; and 


16.	(Currently Amended)  A cluster computing system comprising:
	a plurality of computers connected together through a network;
	a central processing unit (CPU);
	a data storage, wherein said data storage is divided into a plurality of blocks, each block comprising log events; 
a software program stored in a non-transitory memory media, said software program is executed by said CPU to perform parallel operations to generate a set of extraction patterns for each of said plurality of blocks; and
	storing said set of extractions patterns in a cache memory; 
	comparing said sets of extraction patterns from said cache memory;
	if said sets of extraction patterns from said cache memory are the same then use said set of extraction patterns;
	otherwise continuing performing said operations to generate said set of extractions patterns, wherein said operations to generate regular expressions (regexes) as extraction patterns in each of said log events; 
receiving log events, a set of seed words, and a set of seed patterns;
determining whether said set of seed words is full by counting whether the elements contained in said set of seed words reaches a first predetermined positive integer number;
if said set of seed words is not full, then iteratively generating new patterns, selecting a subset of patterns based on pattern scores S1 from said new patterns, and adding said subset of patterns into said set of seed patterns;
using word scores S2 to select a subset of seed patterns from said set of seed patterns;
determining whether said subset of seed patterns is empty by counting whether the number of elements in said subset of seed patterns is zero;
if said subset of seed patterns is not empty, then applying said subset of seed patterns into said log events to extract a new set of words, selecting best words from said new subset of words based on said word scores S2, and adding said best subset of words into said set of seed words; and
repeating said steps (b) to (f) until said set of seed words is full or said subset of seed words is zero, then pruning said set of seed patterns[[.]],
wherein said subset of seed patterns is determined to be empty when said words scores S2 of said seed pattern elements are all below a preset score, and 
wherein said step (f) of determining whether said set of seed patterns is full further comprises determining whether said subset of said seed patterns contains all irrelevant seed elements whose word scores S2 are greater than a predetermined score.

17.	(Canceled)  
  
18.	(Previously Presented) The cluster computing system of claim 16 wherein said step of determining whether said subset of seed patterns is empty further comprises:

said set of seed pattern is determined to be empty when every score of said seed pattern element is below a threshold seed pattern score.

19.   	(Canceled)  

20.	(Previously Presented)  The cluster computing system of claim 16 wherein said step of generating a new set of patterns further comprises:
	selecting a seed word element from said set of seed words;
	selecting a log event that contains said seed word; 
dividing said log event into a prefix section, a post-prefix section, a keyword section, and a suffix section;
	generating different patterns from said prefix section, said post-prefix section, said keyword section, and said suffix section; 
	concatenating said different patterns into said different patterns; 
dividing said log event further comprises providing a start of string for said set of seed patterns;
providing a greedy quantifier with negated character class;
detecting said keyword; 
providing a tag detection to said log events; and



REASONS FOR ALLOWANCE
The following is an examiner’s statement of reasons for allowance:
 	The closest prior art of records, e.g., Oliner et al. (US PGPUB 2018/0314853, hereinafter Oliner) discloses automatically identified in a dataset of events, wherein each occurrence is identified in a portion of raw machine data of a respective event of the events. For each occurrence of the identified occurrences, an extraction rule is generated, which defines a pattern of the occurrence of the example value and is executable to identify PII values in portions of raw machine data of the events using the pattern.  Another prior art of records, e.g., Debnath et al. (US PGPUB 2018/0307576, hereafter Debnath) discloses pattern discovery in input heterogeneous logs having unstructured text content and one or more fields and preprocessing the input heterogeneous logs to obtain pre-processed logs by splitting the input heterogeneous logs into tokens.  However, neither Oliner nor Debnath teaches or suggests, alone or in combination, the particular combination of steps or elements as recited in the independent claims 1 and 17. For examples, it failed to teach “if said set of seed words is not full, iteratively generating new patterns, selecting a subset of patterns based on pattern scores S1 from said new patterns, and adding said subset of patterns into said set of seed patterns; using word scores S2 to select a subset of seed patterns from said set of seed patterns; determining whether said subset of seed patterns is empty by counting whether the number of 2, and adding said best subset of words into said set of seed words; and repeating said steps (b) to (f) until said set of seed words is full or said subset of seed words is zero, then pruning said set of seed patterns
This feature in light of other features, when considered as a whole, in the independent claims 1 and 16 are allowable over the prior arts of record.

An updated search for prior art on EAST database and on domains (NPL-ACM, Google) has been conducted. The prior arts searched and investigated in the database and domains do not fairly teach or suggest the teaching of newly amended claimed subject matter as combined and described in each of the independent claims 1 and 16. 
	The dependent claims depending upon claims 1 and 16 are also distinct from the prior art for the same reason.
Any comments considered necessary by applicant must be submitted no later than the payment of the issue fee and, to avoid processing delays, should preferably accompany the issue fee.  Such submissions should be clearly labeled “Comments on Statement of Reasons for Allowance.”

Contact Information
Any inquiry concerning this communication or earlier communications from the examiner should be directed to TUAN A PHAM whose telephone number is (571)270-3173.  The examiner can normally be reached on M-F 7:45 AM - 6:30 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Tony Mahmoudi can be reached on 571-272-4078.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/TUAN A PHAM/
Primary Examiner, Art Unit 2163