DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This office action is in response to submission of application on 7/22/2019.
Claims 1-20 are presented for examination.
Oath/Declaration
For the record, Examiner acknowledges that the Oaths/Declarations submitted on 8/13/2019 have been received.
Information Disclosure Statement
The information disclosure statement submitted on 1/6/2020 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is considered by the Examiner.
Drawings
The drawings filed on 7/22/2019 are acceptable for the purpose of examination.
Specification
The specification submitted on 5/10/2019 is acceptable for the purpose of examination.
Claim Rejections - 35 USC § 102
The following is a quotation of the appropriate paragraphs of 35 U.S.C. 102 that form the basis for the rejections under this section made in this Office action:
A person shall be entitled to a patent unless –

(a)(1) the claimed invention was patented, described in a printed publication, or in public use, on sale, or otherwise available to the public before the effective filing date of the claimed invention.


Claims 1-4, and 11-14 are rejected under 35 U.S.C. 102(a)(1) as being anticipated by Saurabh et al (US 9135560 B1, herein Saurabh).
Regarding claim 1,
	Saurabh teaches a method performed by one or more computers, the method comprising: (Saurabh, column 1, line 64 “The invention can be implemented in numerous ways, including as a process; an apparatus; a system, a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.” And, column 20, line 46 “A method comprising…” In other words, a method is a method performed by one or more computers.)
	obtaining an unlabeled computer security data log; (Saurabh, FIG. 9, and, column 11, line 17 “Parser engine 934 parses messages in the pile queue and stores the results in structured store 918 in accordance with an applicable schema.  In various embodiments, parser engine 934 includes a library 942 of parser rules/schemas.  If the message has an associated source type (e.g., specifying that the message is from an Apache server, or that it is a credit card transaction), the corresponding rule set will be selected from the library and applied when parsing.  If the source type has not been specified (underline added by Examiner), efficient parsing of the message can nonetheless be performed by platform 102.  As will be described in more detail below, an appropriate rule set can be automatically selected from the library and used (conceptually, turning parser engine 934 into an Apache parser or credit card transaction parser), by performing a heuristic or other evaluation of the message (or sequence of messages).  In some cases, a preexisting parser rule set may not exist for a given message. As data associated with their computer networks, for security, compliance, and other reasons (underline added by Examiner).”

    PNG
    media_image1.png
    456
    640
    media_image1.png
    Greyscale

Saurabh discloses a parser engine that implements a set of rules or schemas.  Each rule set represents a unique parser that is then implemented by the parser engine.  There is a library of rule sets to choose from.  If the data is unlabeled, the platform 102 evaluates the message and then chooses the appropriate rule set from the library and then parses the message.  If the data does not correspond to any known type, the parser generator will then generate a rule set, effectively creating a new parser, and then store the rule set in the library.  For this particular limitation, the source type has not been identified is unlabeled data log, and data associated with their computer networks, for security, compliance, and other reasons is computer security data log.)
	processing the unlabeled computer security data log using a machine learning model to generate a probability distribution that includes a respective probability for each of a plurality of possible log types, (Saurabh, column 13, line 4 “FIG. 12 illustrates an embodiment of a process for automatically selecting a parser.  In some embodiments the process shown in FIG. 12 is performed by platform 102.  The process begins at 1202 when raw data is received from a remote source.  In some embodiments portion 1202 of the process shown in FIG. 12 corresponds to portion 1102 of the process shown in FIG. 11.” And, column 13, line 13 “At 1204, the raw data is evaluated against a plurality of rules.  As one example of the processing performed at 1204, the raw data could be evaluated (e.g., in sequence) against every rule included in library 924 (sic 942, see FIG. 9) by parser engine 934.  As another example, in some embodiments parser engine 934 is implemented as a finite state machine and rules are evaluated in parallel. At 1206, a confidence measure is determined.”

    PNG
    media_image2.png
    202
    282
    media_image2.png
    Greyscale

					        FIG. 11

    PNG
    media_image3.png
    458
    305
    media_image3.png
    Greyscale

					         FIG. 12
In other words, raw data is unlabeled computer security data log, platform 102 is machine learning model, and confidence measure is a probability in a probability distribution.)
	wherein each of the plurality of possible log types is associated with a corresponding parser that parses logs of the possible log type to extract structured computer security data; (Saurabh, column 11, line 17 “Parser engine 934 parses messages in the pile queue and stores the results in structured store 918 in accordance with an applicable schema. In various embodiments, parser engine 934 includes a library 942 of parser rules/schemas.” In other words, messages are a plurality of log types, the log types are associated with sets of rules/schemas, sets of rules/schemas represent parsers, and parser engine 934 implementing a particular rule set is parsing log information to extract structured computer security data.)
selecting the possible log type having the highest probability; and (Saurabh, column 13, line 22 “Suppose the confidence measure for the raw data with respect to an Apache access log parser is 0.999, with respect to a particular vendor’s router parser is 0.321, and with respect to a credit card transaction parser is 0.005. A determination is made that the confidence measure with respect to the Apache access log parser exceeds a threshold, indicating that the received raw data is Apache log data (and in particular, access log data), with a very high confidence.”  In other words, a determination is made that the confidence measure with respect to the Apache access log parser exceeds a threshold is selecting the possible log type having the highest probability.)
	parsing the unlabeled computer security data log using the parser corresponding to the selected possible log type. (Saurabh, column 13, line 22 “Suppose the confidence measure for the raw data with respect to an Apache access log parser is 0.999, with respect to a particular vendor’s router parser is 0.321, and with respect to a credit card transaction parser is 0.005. A determination is made that the confidence measure with respect to the Apache access log parser exceeds a threshold, indicating that the received raw data is Apache log data (and in particular, access log data), with a very high confidence.”  In other words, a determination is made that the confidence measure with respect to the Apache access log parser exceeds a threshold is selecting and then using the rule set that implements the log parser corresponding to the selected possible log type.)
Regarding claim 2,
	Saurabh teaches the method of claim 1,
	further comprising maintaining a mapping from each of a plurality of log types to a parser corresponding to the log type, wherein one or more of the plurality of log types includes a plurality of log subtypes.  (Saurabh, FIG. 9, block 934 and block 942. And, column 13, paragraph 5, line 49 “In the case of syslog data (which aggregates log data from multiple applications), the source type could remain set to syslog, however, individual messages of the respective contributors to the log (e.g., ssh) can be labeled.” In other words, block 934 is a parser engine that maintains a library, block 942, of parsers for log types is maintaining a mapping of a plurality of log types, and the source type could remain set to syslog, and individual messages of the respective contributors is one or more of the plurality of log types includes a plurality of log subtypes.)
Regarding claim 3,
	Saurabh teaches the method of claim 1,
	further comprising: determining that the parser corresponding to the selected possible log type did not successfully parse the unlabeled computer security data log; and in response, (Saurabh, column 14, line 66 “Suppose the log data shown in FIG. 13A (along with several thousand additional lines) is received (e.g. at 1202 in the process shown in FIG. 12) and, after portions 1204 and 1206 of the process shown in FIG. 12 have been performed, none of the rules in library 942 are determined to be a match (e.g., because all of the confidence measures are low).” Step 1204 is “Evaluate a portion of the raw data using a plurality of rules” this means attempt to parse the data log. At this point, step 1206 “Determine a confidence measure for at least some of the rules” determines whether the parser was successful at parsing the unlabeled data.  In other words, determine confidence measure is determine that the parser corresponding to the selected possible log type did, or did not, successfully parse the unlabeled data.)
	parsing the unlabeled computer security data log using the parser corresponding to the possible log type having the second highest probability.  (Saurabh, column 14, paragraph 6, line 1 “Suppose the log data shown in FIG. 13A (along with several thousand additional lines) is received (e.g. at 1202 in the process shown in FIG. 12) and, after portions 1204 and 1206 of the process shown in FIG. 12 have been performed, none of the rules in library 942 are determined to be a match (e.g., because all of the confidence measures are low).” Saurabh, column 15, paragraph 2, line 1 “FIG. 14 illustrates an embodiment of a process for automatically generating a parser.”  In other words, step 1204 “Evaluate at least a portion of the raw data using a plurality of rules” is first trying the parser with the highest confidence level, and the second highest confidence level (i.e. a plurality of rules), and generating a parser when no parser is found to parse the log is parsing the unlabeled computer security data log.  Examiner notes that in Saurabh, if no correct parser from the library is identified, a new parser is generated.)
Regarding claim 4,
	Saurabh teaches the method of claim 3,
	further comprising: determining that the parser corresponding to the possible log type having the second highest probability successfully parsed the unlabeled computer security data log; (Saurabh, column 14, line 29 “In some cases, messages may match multiple types of rules with a high confidence.” Continued at line 36 “In this scenario, the administrator of the blade could be notified of the different types of data appearing in the syslog and be given the opportunity to have those two types of data individually tagged (e.g. with an “Apache” tag and an “ntp” tag).  In other words, those two types of data individually tagged is determining that the parser corresponding to the possible log type having the second highest probability successfully parsed the unlabeled computer security data log.)
	generating training data, the training data comprising the unlabeled computer security data log and a label that identifies the possible log type having the second highest probability; and (Saurabh, column 11, line 24 “If the source type has not been specified, efficient parsing of the message can nonetheless be performed by platform 102.  As will be described in more detail below, an appropriate rule set can be automatically selected from the library and used (conceptually, turning parser engine 934 into an Apache parser or credit card transaction parser), by performing a heuristic or other evaluation of the message (or sequence of messages).  In some cases, a preexisting parser rule set may not exist for a given message.  As will also be described in more detail below, an appropriate rule set can be automatically generated (e.g., by parser generator 940) and ultimately stored in the parser library.”  In other words, appropriate rule set can be automatically generated… and ultimately stored in the parser library is generate training data, and being stored in the library is saving it to be used when the particular data log type is encountered in the future.)
Claims 11 – 14 are system claims corresponding to method claims 1 – 4, respectively.  Otherwise they are the same.  Saurabh teaches a system (Saurabh, column 1, line 64 “The invention can be implemented in numerous ways, including as a process; an apparatus; a system, a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.” In other words, a system is a system.) Therefore, claims 11 – 14 are rejected for the same reasons as claims 1 – 4, respectively.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 5 – 10 and 15 – 20 are rejected under 35 U.S.C. 103 as being unpatentable over Saurabh, and Hartmann (US 7690037 B1, herein Hartmann).
Regarding claim 5,
	Saurabh teaches a method performed by one or more computers, the method comprising: (Saurabh, column 1, line 64 “The invention can be implemented in numerous ways, including as a process; an apparatus; a system, a composition of matter; a computer program product embodied on a computer readable storage medium; and/or a processor, such as a processor configured to execute instructions stored on and/or provided by a memory coupled to the processor.” And, column 20, Claim 9, “A method comprising…” In other words, a method is a method performed by one or more computers.)
	Thus far, Saurabh does not explicitly teach generating training data, the training data comprising a plurality of training computer security data logs and,
	Hartmann teaches generating training data, the training data comprising a plurality of training computer security data logs and, (Hartmann, column 1, line 7 “This invention pertains in general to computerized machine learning and in particular to generating training data for use in machine learning.” And, column 1, line 24 “Machine learning is useful for training DIDSs (database intrusion detection system - added by Examiner) and other security systems where the complexity of the incoming traffic frustrates attempts at manual specification of legitimate and anomalous patterns.” In other words, generating data for use in machine learning is generating training data, and incoming traffic is computer security data logs.)
	for each training computer security data log, a label that identifies a log type of the training computer security data log; and (Hartmann, column 1, line 30 “Machine learning relies on training data, such as a set of training database queries, captured during data center operations.  In traditional supervised machine learning, training data are marked as either legitimate or anomalous so the learning algorithm can correctly differentiate between the two types of activity.” In other words, training data are marked is a label that identifies a log type of the training computer security data log.)
	training a machine learning model to predict log types of unlabeled computer security data logs using the training data.  (Saurabh, FIG. 9, block 934 and block 942, and FIG. 12.  See 3rd mapping in paragraph 10, In other words, block 934 is a parser engine that maintains a library, block 942, of parsers for log types, and step 1206 “determine a confidence measure” is predicting log types of unlabeled computer security data logs.)
	Both Hartman and Saurabh are directed to processing and evaluating data for, among other things, the purposes of computer security.  In view of the teaching of Saurabh it would be obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teaching of Hartmann into Saurabh.  This would result in being able to generate training data for the purpose of training the machine learning model.
	One of ordinary skill in the art would be motivated to do this because there is a need in the art for a way to generate training data for machine learning for computer security.  (Hartmann, column 1, line 50 “Accordingly, there is a need in the art for a way to generate training data for machine learning that are less likely to contain data representing anomalous activities.”)
Regarding claim 6,
The combination of Hartmann and Saurabh teach the method of claim 5,
	further comprising: receiving labeled training data; (Saurabh, column 20, line 18 “wherein, when the subsequent raw data labeled with the identifier is received,” In other words data labeled with the identifier is received is receiving labeled training data.)
	for each training computer security data log of the labeled training data: determining a first log type for the training computer security data log using a corresponding label for the training computer security data log; (Saurabh, column 20, line 18 “wherein, when the subsequent raw data labeled with the identifier is received, the particular parser is automatically selected to parse the subsequent raw data based at least in part on the identifier” In other words, automatically selecting a particular parser is determining a first data log type for the training computer security data log using a corresponding label.)
determining, using a mapping from each of a plurality of log types to a parser corresponding to the log type, a parser that corresponds to the first log type; (Saurabh, column 20, line 18 “wherein, when the subsequent raw data labeled with the identifier is received, the particular parser is automatically selected to parse the subsequent raw data based at least in part on the identifier” In other words, automatically selecting a particular parser is using a mapping from each of a plurality of log types to a parser corresponding to the log type. See paragraph 10 above for description of parser engine 934 with parser library 942.)
	parsing the training computer security data log using the parser that corresponds to the first log type; (Saurabh, column 20, line 18 “wherein, when the subsequent raw data labeled with the identifier is received, the particular parser is automatically selected to parse the subsequent raw data based at least in part on the identifier” In other words, automatically selecting a particular parser to parse the subsequent raw data is parsing the training computer security data log using the parser that corresponds to the log type.)
	determining that the parser successfully parsed the training computer security data log; and (Saurabh, column 20, line 32 “The system of claim 4, wherein the configuration of the first blade module is caused to be updated in response to receiving a confirmation from the administrator of the remote device.” In other words, confirmation from the administrator is determining that the parser successfully parsed the training computer security data log.)
	in response to determining that the parser successfully parsed the training computer security data log, adding the training computer security data log and the first log type to the training data. (Saurabh, column 20, line 36 “the system of claim 1 wherein the configuration of the blade module is caused to be updated in response to the determination of the confidence measure with respect to the particular parser.” In other words, caused to be updated is adding the training computer security data log and the first log type to the training data.)

Regarding claim 7,
	The combination of Hartmann and Saurabh teaches the method of claim 5,
	further comprising: receiving unlabeled training data (Saurabh, column 19, line 58 “receive raw data acquired by a blade module from a remote device” In other words, receive raw data is receiving unlabeled training data.)
	for each training computer security data log of the unlabeled training data: parsing the training computer security data log of the unlabeled training data using parsers selected from a plurality of parsers until a particular parser successfully parses the training computer security data log of the unlabeled training data, (Saurabh, FIG. 9, block 942, column 20, line 6 “in response to the determination that the confidence measure with respect to the particular parser exceeds the threshold, provide as output to the particular parser an indication that the raw data is associated with at least a source type associated with the particular parser.” And, column 20, line 18 “wherein, when the subsequent raw data labeled with the identifier is received, the particular parser is automatically selected to parse the subsequent raw data based at least in part on the identifier” In other words, automatically selecting a particular parser is parsing the training computer security data log, 942 rule set library is using parsers selected from a plurality of parsers, and in response to the determination that the confidence measure with respect to the particular parser exceeds the threshold is a particular parser successfully parses the training computer security data log of the unlabeled training data.)
	wherein each parser of the plurality of parsers corresponds to a different log type; determining a particular log type corresponding to the particular parser; and adding the training computer security data log and the particular log type to the training data.  (Saurabh, FIG. 9, block 934 and block 942. And, column 13, paragraph 5, line 49 “In the case of syslog data (which aggregates log data from multiple applications), the source type could remain set to syslog, however, individual messages of the respective contributors to the log (e.g., ssh) can be labeled.” In other words, library, block 942, of parsers for log types is each parser of the plurality of parsers corresponds to a different log type, selecting a parser from the library, 942, is determining a particular log type for a particular parser, and individual messages can be labeled and included in the library is adding the computer security log type to the training data.)
Regarding claim 8,
The combination of Hartmann and Saurabh teach the method of claim 5,
wherein the training data further comprises a label that identifies a log subtype of the training computer security data log.  (Hartmann, column 6, line 35 “In one embodiment, a data transformation module 418 transforms the corpus remaining after the filtering into the training data stored in the training data module 312.  For example, in one embodiment the data transformation module 418 converts database queries in the corpus into their canonical forms.  In another embodiment, the data transformation module 418 populates data structures that describe any associations between different database tables and fields made by the query and/or relationships between the fields of a table and the constraints that are applicable to the fields.” In other words, populates data structures that describe any association is comprises a label that identifies a log subtype of the training computer security data log.) 
Regarding claim 9,
The combination of Hartmann and Saurabh teach the method of claim 8,
wherein a parser that successfully parses a computer security data log labeled with a first type and a first subtype also successfully parses a computer security data log labeled with the first type and a second subtype.  (Saurabh, FIG. 9, block 934 and block 942. And, column 13, paragraph 5, line 49 “In the case of syslog data (which aggregates log data from multiple applications), the source type could remain set to syslog, however, individual messages of the respective contributors to the log (e.g., ssh) can be labeled.” Block 934 is a parser engine that maintains a library, block 942, of parsers for log types, and, the source type could remain set to syslog, and individual messages of the respective contributors is one or more of the plurality of log types includes a plurality of log subtypes. In other words, source type could remain set to syslog is parses a security data log type, individual messages of the respective contributors is a first subtype and a second subtype, and can be labeled is parses a computer security data log labeled with the first type and a second subtype.)
Regarding claim 10,
The combination of Hartmann and Saurabh teaches the method of claim 9,
wherein a first log type comprises a plurality of log subtypes. (Saurabh, FIG. 9, block 934 and block 942. And, column 13, paragraph 5, line 49 “In the case of syslog data (which aggregates log data from multiple applications), the source type could remain set to syslog, however, individual messages of the respective contributors to the log (e.g., ssh) can be labeled.” Block 934 is a parser engine that maintains a library, block 942, of parsers for log types. In other words, source type is a first log type, and individual messages of the respective contributors is a plurality of log subtypes.)
Claims 15 – 20 are system claims corresponding to method claims 5 – 10, respectively. Otherwise they are the same. The combination of Hartmann and Saurabh teach a system (see paragraph 14). Therefore, claims 15 – 20 are rejected for the same reasons as claims 5 – 10, respectively.
Conclusion
	Any inquiry concerning this communication or earlier communications from the examiner should be directed to BART RYLANDER whose telephone number is (571)272-8359.  The examiner can normally be reached on Monday - Thursday 8:00 to 5:30.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Miranda Huang can be reached on 571-270-7092.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/B.I.R./Examiner, Art Unit 2124                                                                                                                                                                                                        

/MIRANDA M HUANG/Supervisory Patent Examiner, Art Unit 2124