Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA . 

Response to Amendment
This communication is in response to the amendment filed on 09/02/2022. The Examiner acknowledges amended claims 1-20. No claims have been cancelled or added. Claims 1-20 are pending and claims 1-20 are rejected.  Claims 1, 7, and 14 is/are independent. 
	
	
Response to Arguments
Claim Interpretation 
Applicant's arguments filed 09/02/2022 have been fully considered.  
Applicant argues (see Remarks, middle of page 10) that:
The Office Action states that this application includes one or more claim limitations recited in claim 7 that do not use the word "means," but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier. 
Applicant respectfully submits the "control unit" recited in claim 7 includes sufficient structural elements, as described in paragraph [0075] of the as-filed specification, to perform the claimed functions. Therefore, these claim limitations should not be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Examiner respectfully disagrees. To avoid interpretation under 35 U.S.C. 112(f), the structure that is described in the specification must be explicitly recited in the claim. See MPEP section 2181, which states: 
Accordingly, examiners will apply 35 U.S.C. 112(f) to a claim limitation if it meets the following 3-prong analysis:
(A) the claim limitation uses the term "means" or "step" or a term used as a substitute for "means" that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function;
(B) the term "means" or "step" or the generic placeholder is modified by functional language, typically, but not always linked by the transition word "for" (e.g., "means for") or another linking word or phrase, such as "configured to" or "so that"; and
(C) the term "means" or "step" or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function.

Here, the generic placeholder “unit” is not modified by sufficient structure, material, or acts for performing the claimed function. The required structure must be explicitly recited in the claim, and not merely mentioned in the specification as components that can be included. 

Independent claims 1 and 7 
Applicant argues (see Remarks, bottom half of page 11 to the 2nd paragraph of page 12, and page 12, 5th and 6th paragraphs) that the references cited in the previous rejection fail to disclose the newly amended claim features.  This argument is persuasive with respect to claims 1-13. Therefore, the rejections are withdrawn with respect to claims 1-13. However, upon further consideration, a new ground of rejection is made in view of Hendler et al., Detecting Malicious PowerShell Commands using Deep Neural Networks, ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 2018 (hereinafter “Hendler”) in view of Xu et al. U.S. Publication 20190166141 (hereinafter “Xu”), further in view of Rei et al. U.S. Publication 20180204120 (hereinafter “Rei”).
Rei teaches that an artificial neural network (ANN) can predict the next character in a sequence of characters (para. 56, 58). Rei in combination with the teachings of the Hendler and Xu references discloses the amended limitations of claim 1.
Regarding independent claim 7, claim 7 is rejected in view of Hendler, in view of Xu, further in view of Rei. Claim 7 uses different terminology from claim 1 but is otherwise similar to claim 1 and claim 7 is rejected for the same reasons as claim 1. The sections of the references cited for claim 1 are also cited for rejecting claim 7 below.
Independent claim 14 	Applicant argues (page 16, entire page and page 17, 1st and 2nd paragraphs) that
The Office relies on Oliner to teach the feature of "determining a trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training command-line text, the immediately following character in the training command-line text, within a predetermined accuracy." Office Action, p. 44-49. Applicant respectfully disagrees. Nevertheless, solely in the interest of expediting allowance, Applicant herein amends claim 14 as shown above. 
Oliner describes a technology that facilitates the production of and the use of automated datagens for event-based systems. Oliner, Abstract. Oliner describes that "At block 2014, the computer system produces new events by the trained datagen without additional input.... The new event production includes the generation of a sequence of characters of the new events based upon the calculated statistical predictions of the sequence of the textual characters of the training corpus." Id., para. [0279]. However, Oliner does not teach or suggest that the trained datagen, based on a sequence of the textual characters of the training corpus, generates a representation vector including a predicted next character for each of the textual character of the training corpus. Therefore, Oliner fails to teach or suggest the feature of "the trained representation mapping is configured to receive a sequence of characters and for an individual character, output a respective representation vector including a predicted character following the individual character," as recited in claim 14. 
 Kuperman describes a system for protecting web applications at a host by analyzing web application behavior to detect malicious client requests. Kuperman, Abstract. However, Kuperman also does not teach or suggest the feature of "the trained representation mapping is configured to receive a sequence of characters and for individual character, output a respective representation vector including a predicted character following the individual character," as recited in claim 14, and does not remedy the deficiency of Oliner, set forth above. 
For at least the reasons presented herein, the combination of Oliner and Kuperman does not teach or suggest all of the features of claim 14. Accordingly, Applicant respectfully requests that the Office withdraw the § 103 rejection of claim 14.

Examiner respectfully disagrees. 
Regarding independent claim 14, claim 14 is rejected in view of Oliner et al. U.S. Publication 20200090027 (hereinafter “Oliner”), further in view of Kuperman et al. U.S. Publication 20170244737 (hereinafter “Kuperman”). Oliner discloses output vectors corresponding to characters (para. 263) and a trained deep-learning engine predicting the next character in the sequence of input characters (para. 263, 266-267, 272, 279, 284). Kuperman discloses generating a model for classifying unknown requests such as requests received from clients that are not known to be malicious or non-malicious. Kuperman also discloses a command-line interface for clients accessing online data (Kuperman para. 28, 49, 54, 57). 
Furthermore, Applicant argues limitations that are not in the claim. Nowhere does the claim recite “generates a representation vector including a predicted next character for each of the textual character of the training corpus”. At most, claim 14 recites outputting the vector and the predicted character for one individual character. The features relied upon for Applicant’s argument are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993).
	Therefore, the rejection is maintained with respect to claim 14.
Regarding applicant’s arguments (page 12, 3rd paragraph, page 13, 3rd paragraph to page 15, 2nd paragraph from bottom, and page 17, 3rd paragraph to page 18 bottom paragraph) with respect to dependent claims 2-6, 8-13, and 15-20, the dependent claims inherit their respective limitations from the respective independent claims, and are therefore rejected for the same reasons as the respective independent claims.
	Applicant's arguments/amendments have been fully considered, but are moot in view of the new grounds of rejection. Note that this action is made FINAL. See MPEP § 706.07(a).
Accordingly, Applicant's argument is persuasive, the rejection is withdrawn, and new ground(s) of rejection are presented herein.

	
	Claim Interpretation
The following is a quotation of 35 U.S.C. 112(f):
(f) Element in Claim for a Combination. – An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof. 

The following is a quotation of pre-AIA  35 U.S.C. 112, sixth paragraph:
An element in a claim for a combination may be expressed as a means or step for performing a specified function without the recital of structure, material, or acts in support thereof, and such claim shall be construed to cover the corresponding structure, material, or acts described in the specification and equivalents thereof.

The claims in this application are given their broadest reasonable interpretation using the plain meaning of the claim language in light of the specification as it would be understood by one of ordinary skill in the art.  The broadest reasonable interpretation of a claim element (also commonly referred to as a claim limitation) is limited by the description in the specification when 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is invoked. 
As explained in MPEP § 2181, subsection I, claim limitations that meet the following three-prong test will be interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph:
(A)	the claim limitation uses the term “means” or “step” or a term used as a substitute for “means” that is a generic placeholder (also called a nonce term or a non-structural term having no specific structural meaning) for performing the claimed function; 
(B)	the term “means” or “step” or the generic placeholder is modified by functional language, typically, but not always linked by the transition word “for” (e.g., “means for”) or another linking word or phrase, such as “configured to” or “so that”; and 
(C)	the term “means” or “step” or the generic placeholder is not modified by sufficient structure, material, or acts for performing the claimed function. 
Use of the word “means” (or “step”) in a claim with functional language creates a rebuttable presumption that the claim limitation is to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites sufficient structure, material, or acts to entirely perform the recited function. 
Absence of the word “means” (or “step”) in a claim creates a rebuttable presumption that the claim limitation is not to be treated in accordance with 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph. The presumption that the claim limitation is not interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, is rebutted when the claim limitation recites function without reciting sufficient structure, material or acts to entirely perform the recited function. 
Claim limitations in this application that use the word “means” (or “step”) are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action. Conversely, claim limitations in this application that do not use the word “means” (or “step”) are not being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, except as otherwise indicated in an Office action.
This application includes one or more claim limitations that do not use the word “means,” but are nonetheless being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, because the claim limitation(s) uses a generic placeholder that is coupled with functional language without reciting sufficient structure to perform the recited function and the generic placeholder is not preceded by a structural modifier.  Such claim limitation(s) is/are: 


a control unit configured to perform operations comprising: 

sequentially providing event-data values of the ordered sequence of event-data values to the trained representation mapping to determine respective representation vectors, wherein a first representation vector of the representation vectors is associated with a first event-data value of the ordered sequence of event-data values; 

determining a first indicator at least partly by applying the first representation vector to the trained classifier; and 

determining that the first event-data value is associated with a security violation based at least in part on the first indicator satisfying the indicator-security criterion.  

in claim 7.
Because this/these claim limitation(s) is/are being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, it/they is/are being interpreted to cover the corresponding structure described in the specification as performing the claimed function, and equivalents thereof.
If applicant does not intend to have this/these limitation(s) interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph, applicant may:  (1) amend the claim limitation(s) to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph (e.g., by reciting sufficient structure to perform the claimed function); or (2) present a sufficient showing that the claim limitation(s) recite(s) sufficient structure to perform the claimed function so as to avoid it/them being interpreted under 35 U.S.C. 112(f) or pre-AIA  35 U.S.C. 112, sixth paragraph.

Claim Rejections - 35 USC § 103
	The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

	The factual inquiries set forth in Graham v. John Deere Co., 383 U.S. 1, 148 USPQ 459 (1966), that are applied for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
	
	This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1-2, 6-9, and 11 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler et al., Detecting Malicious PowerShell Commands using Deep Neural Networks, ASIACCS '18: Proceedings of the 2018 on Asia Conference on Computer and Communications Security, 2018 (hereinafter “Hendler”) (submitted in IDS) in view of Xu et al. U.S. Publication 20190166141 (hereinafter “Xu”), further in view of Rei et al. U.S. Publication 20180204120 (hereinafter “Rei”).
As per claim 1, Hendler discloses 
A method of determining that an event is associated with a security violation, wherein: 
(See Hendler 
[method described in page 12, the 3rd to 5th paragraph, to detect commands generated by malware ]
page 8, left column, 3rd paragraph Using
these networks for text classification requires to
encode the text so that the network can process it.
Zhang et al. [46] explored treating text as a “raw
signal at character level” and applying to it a onedimensional
CNN for text classification. We take
a similar approach [A method  ] for classifying PowerShell commands[each command received is an event]
as either malicious or benign.

Hendler page 12, left column, 4th paragraph Each of these names appears only once and they are most probably generated by a domain generation algorithm (DGA) [49] used by the malware[event is associated with a security violation] for communicating with its command and control center.
)

the event is associated with a monitored computing device; 
the event is associated with event data comprising command-line text; 
the command-line text comprises a plurality of characters; 
and the method comprises: 
(See Hendler 

Hendler Page 15, right column, 3rd paragraph we targeted the detection of
individual PowerShell commands that are executed
via the command-line. 

Hendler Page 12, right column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications of maliciousness such as the web client [the event is associated with a monitored computing device; monitored computing device can be disclosed by the web client or users in Microsoft corporate network (page 15, right column, bottom paragraph)] 

Hendler page 8, right hand column, 3rd paragraph 
converting each character of the (possibly truncated) command[the event is associated with event data comprising command-line text; the command-line text comprises a plurality of characters; ]
to a vector all of whose first 61 entries are 0 except
for the single entry corresponding to the character’s code.
[method described in page 12, the 3rd to 5th paragraph, to detect commands generated by malware ] 
Hendler page 8, left column, 3rd paragraph
We take similar approach [A method ]for classifying PowerShell commands
as either malicious or benign.
)

sequentially providing characters of the plurality of characters to a representation mapping to determine respective representation vectors; 
(See Hendler figure 4(a) [each row is a vector for the character shown on the left column]
Hendler page 8, right hand column, 3rd paragraph 
converting
each character of the (possibly truncated) command
to a vector all of whose first 61 entries are 0 except
for the single entry corresponding to the character’s code.

Hendler Page 9, left side column, top paragraph 
the input we provide to our RNN classifier
is a vector of numbers 
…….i’th element is the code (as described above) of the
i’th command character
)

determining, for each of the characters, a respective indicator at least partly by applying the respective representation vector to a trained classifier;
locating at least one token in the command-line text based at least in part on the respective indicators of the characters in the command-line text; and 
(See Hendler
page 10, left column, 3rd paragraph 
As for the usage of random names[ at least one token= random names]  (obfuscation[obfuscation means that the malware is trying to obfuscate, see page 4, left column, 4th paragraph  “numerous ways of obfuscating Power-Shell commands,”] method 11), these typically include numbers (converted to the ‘*’ sign) or alternating casing, and can therefore be learnt by our classifiers as well.[ locating at least one token; learns the random names and then detects random names, which are tokens, based on the learning] (As we describe later, our deep learning classifiers do a better job in learning such patterns.) 

Hendler Page 12, left column, bottom paragraph
Hendler Figure 4a depicts an example of how such a host name is encoded in the input to the neural network.[ Figure 4a Each row is a vector in the figure]
Note the pattern of alternating zeros and ones
[respective indicators of the characters can be disclosed by the pattern of each column vector (e.g., where is each 1 is located in each of the column vector) corresponding to the characters of the command, in figure 4a;
respective indicator can be disclosed by the pattern of where the 1 is located in a column vector corresponding to a character of Figure 4a 
(or respective indicator can be the result of the application of the Figure 4b  neural network  filter to the respective column vector of Figure 4a); each character of the command has a corresponding column vector, the pattern of the column vector is detected by the neural network using the neural network filter; when the neural network detects a pattern of alternating 0 and 1 (page 12, bottom paragraph “alternating zeros and ones in the row”) that indicates there is multiple digits in the command, the asterisks represents digits, and the digits may form part of a hostname generated by malware (page 12, left hand column, 3rd paragraph “sequence of alternating digits and characters” “most probably malicious”]  
in the row corresponding to the symbol ‘*’. Figure 4b depicts a neural network filter [trained classifier ]of size 3 that is able to detect occurrences of this pattern. [ applying the respective representation vector to a trained classifier; The neural network filter is applied to the vectors to detect patterns in the command, these patterns indicate malicious commands, as indicated by the pattern of alternating asterisks at the bottom of each column of figure 4a]
Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the PowerShell commands may include additional indications of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to detect patterns that incorporate random characters[locating at least one token in the command-line text; detected uppercase or lowercase patterns or detected random character patterns are part of random names that are used to obfuscate by malware, the names may disclose at least one token]
and/or casing [casing means uppercase/lowercase patterns] that causes 4-CNN to assign these command a score above the threshold,[indicates a malicious command] unlike the 3-gram detector.
Hendler page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters[locating at least one token; the token can be any interesting sequence of characters in the command that indicates malicious activity]. In most cases, this sequence represented the name of the host or domain from which the command downloaded (most probably malicious[security violation; maliciousness is a probability determination]) content.
)

determining that the event is associated with a security violation based at least in part on the at least one token satisfying a stored token-security criterion.  
(See Hendler 
page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters[based at least in part on the at least one token satisfying a stored token-security criterion]. In most cases, this sequence represented the name [token] of the host or domain from which the command [event] downloaded (most probably malicious[security violation; maliciousness is a probability determination]) content.
[See also the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated 
[most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that the domain name was generated by malware ]  by a domain generation algorithm (DGA) [49] used by the malware 
[stored token-security criterion = a sequence of alternating digits and
characters. Page 12, left column, 3rd paragraph
the at least one token= d*c*a*ci*x* ]

Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to
detect patterns that incorporate random characters
and/or casing that causes 4-CNN to assign these command a score above the threshold,[ determining that the event is associated with a security violation] unlike the 3-gram
detector.
)

	However, Hendler does not expressly disclose 
sequentially providing characters of the plurality of characters to a trained representation mapping to determine respective representation vectors; 
Xu discloses a technique for training an encoder to encode event data as a vector
(See Xu Para. [0152]
In subsequent examples, this trained behavior model is referred to as “Behavior2Vec model”. For example, an autoencoder may be trained that includes a Behavior2Vec model as an encoder and a Vec2Behavior model as a decoder. In some embodiments, the autoencoder architecture is based on recurrent neural networks (RNN) or convolutional neural networks (CNN) or other deep learning architecture. The trained Behavior2Vec model can then be used to generate a latent feature vector of fixed length for any sequence of input events, including behavior data that includes any sequence of input events generated at a client device in association with a request. 
See also para. 132 sequence keyboard event data
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Hendler with the technique for training an encoder to encode event data as a vector of Xu to include 
sequentially providing characters of the plurality of characters to a trained representation mapping to determine respective representation vectors; 
One of ordinary skill in the art would have made this modification to improve the ability of the system to use machine learning to generate the vectors from characters, so that the system can automatically determine the appropriate vectors to generate. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to train a recurrent neural network or other deep learning architecture as taught in the Xu reference to generate vectors based on event data. The event data may be commands (e.g., a matrix with column vector can be generated corresponding to a command) of the Hendler base reference.

	However, the combination of Hendler and Xu does not expressly disclose 
wherein the trained representation mapping is trained to predict, based on an individual character, at least one predicted character following the individual character;
Rei discloses 
wherein the trained representation mapping is trained to predict, based on an individual character, at least one predicted character following the individual character;
(See Rei Para. [0056]
The ANN 400 generates one or more predicted next items in a sequence of items based on an input sequence item, for example predicting the next word that a user may wish to include in a sentence based on the previous word that the user has input to the system. The following description is presented with respect to the specific embodiment of predicting the next word in a sequence of words, but it will be appreciated that the disclosure can be readily generalised to other sequences of items with no changes to the architecture of the ANN 400 by training the ANN 400 on different sets of data. For example, the same ANN 400 could be used to predict the next item in a sequence of items, for example: words, characters, logogram character strokes, e.g. Hanzi, morphemes, word segments, punctuation, emoticons, emoji, stickers, and hashtags, or optical character recognition or user intention prediction. For example, if the input to the ANN 400 is an operating system or software application event, the ANN 400 may generate a prediction of the next action that a user might wish to carry out
Rei Para.  [0058]
If the ANN 400 is used for a purpose other than predicting the next word in a sequence, the appropriate input to the ANN 400 will be represented by the 1-of-N vector 402 instead. For example, if the ANN 400 is used to predict the next morpheme, the 1-of-N vector 402 will represent the input morpheme. Similarly, if the ANN 400 is used to predict the next character, the 1-of-N vector 402 will represent the input character.
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler and Xu with the technique for predicting the next character using the trained model of Rei to include 
wherein the trained representation mapping is trained to predict, based on an individual character, at least one predicted character following the individual character;
One of ordinary skill in the art would have made this modification to improve the ability of the system to predict the next character in order to detect malicious activity sooner. The system of the primary reference (e.g., classifier) can be modified to predict the next character and use that prediction to detect malicious activity. 
As per claim 2, the rejection of claim 1 is incorporated herein. 
The combined teaching of Hendler, Xu, and Rei discloses
wherein: the plurality of characters comprises at least one special character and at least one non- special character; 
the locating the at least one token comprises identifying a first sequence of adjacent characters of the plurality of characters beginning from a starting character of the plurality of characters until reaching a special character preceded by a first character, wherein: 
the respective indicator of the special character indicates that the special character is not associated with a security violation; and the 
respective indicator of the first character indicates that the first character is not associated with a security violation.  
 (See Hendler
[applying the neural network filter of figure 4b to the command “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. , as illustrated in figure 4 on page 13]
(See Hendler [See the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated 
[most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that  the domain name was generated by malware ]  
by a domain generation algorithm (DGA) [49] used by the malware 
[the plurality of characters= DownloadFile (’http://d*c*a*ci*x*.<domain>’)
at least one special character = :
at least one non- special character;=t
a first sequence of adjacent characters = http://
a starting character= h

a special character preceded by a first character = p:
a special character = :
first character= p
Hendler Page 12, left column, 3rd paragraph
the at least one token= d*c*a*ci*x* ]
the respective indicator of the special character indicates that the special character is not associated with a security violation; and the 
respective indicator of the first character indicates that the first character is not associated with a security violation.  
[ can be disclosed by the application of the neural network filter of figure 4b to the command DownloadFile (’http://d*c*a*ci*x*.<domain>’), which would indicate that http:// does not does not have a pattern of alternating ones and zeros from the respective column vectors and therefore does not include any alternating asterisks that may be indicative of a hostname generated by malware.]
)

As per claim 6, the rejection of claim 1 is incorporated herein. 
The combined teaching of Hendler, Xu, and Rei discloses wherein: 
the trained representation mapping comprises at least two recurrent neural network (RNN) layers; and 
the trained classifier comprises at least one logistic-classification unit.  
(See Hendler page 6 2.2.2 Recurrent Neural Networks (RNNs)
RNNs are neural networks able to process sequences
of data representing, e.g., text [30, 31], speech [32, 33,
34], handwriting [35] or video [36] in a recurrent manner,
that is, by repeatedly using the input seen so far
in order to process new input. We use an RNN network
…..
In the context of text analysis, a common practice
is to add an embedding layer before the LSTM layer
[38, 39]. Embedding layers serve two purposes.[ at least two recurrent neural network (RNN) layers; and]

Hendler Page 6 4.2 Traditional NLP-based detectors
We used two types of NLP feature extraction methods
– a character level 3-gram and a bag of words
(BoW). In both we evaluated both tf and tf-idf and
then applied a logistic regression classifier on extracted
features.
The embedding layer converts each input
token (typically a word or a character, depending
on the problem at hand) to a vector representation.[The classifier
 of claim 6 can read on the combination of the vector generator and an vector classifier of the Hendler reference, in which case the vector generator is just a preprocessing portion of the classifier]
)

As per claim 7, Hendler discloses 
a representation mapping, a trained classifier, and an indicator-security criterion; 
(See Hendler page 8, right hand column, 3rd paragraph 
The input to the CNN network is then prepared by using “one-hot” encoding
of command characters, that is, by converting
each character of the (possibly truncated) command
to a vector [representation mapping ]

Hendler Page 9, left side column, top paragraph 
the input we provide to our RNN classifier[trained classifier]
is a vector of numbers of size at most 1,024, whose
i’th element is the code (as described above) of the
i’th command character (characters that were not assigned
a code are skipped)

Hendler Page 12, left column, bottom paragraph 
Figure 4b depicts a neural network filter [trained classifier ]of size 3 that is able to
detect occurrences of this pattern.

Hendler page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters [indicator-security criterion is disclosed by detecting a sequence of alternating digits and characters which indicates malicious activity]. In most cases, this sequence represented the name of the host or domain from which the command downloaded (most probably malicious[note maliciousness is a probability determination]) content.
 )

receive an event record, wherein the event record: is associated with an event of a plurality of events; 
is associated with a monitored computing device of a plurality of monitored computing devices; and 
comprises an ordered sequence of event-data values; and 
(See Hendler 
[an event record = received data regarding 1 of the commands; each command is an event; ordered sequence of event-data values is all the letters that make up the command

Hendler page 12, left column, 3rd paragraph 
Out of the new 42 detected commands,[ a plurality of events] 15 commands
contain a sequence of alternating digits and characters.

Hendler Page 12, left column, 3rd paragraph 
example of the usage of such a name that appeared
in one of the newly detected commands is:[ an event of a plurality of events; ]
“..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”.

Hendler page 12, right column, 2nd paragraph 
We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client[ a monitored computing device ] or the
cmdlets they use.
)

sequentially providing event-data values of the ordered sequence of event-data values to the representation mapping to determine respective representation vectors, wherein a first representation vector of the representation vectors is associated with a first event-data value of the ordered sequence of event-data values; 
(See Hendler 
Hendler page 8, right hand column, 3rd paragraph 
The input to the CNN network is then prepared by using “one-hot” encoding
of command characters, that is, by converting each character of the (possibly truncated) command to a vector 
 [sequentially providing event-data values of the ordered sequence of event-data values to the representation mapping to determine respective representation vectors
ordered sequence of event-data values = each of the characters of a command  as they are converted to a vector; see figure 4 a, where each character is shown with its corresponding vector
first representation vector of the representation vectors is associated with a first event-data value of the ordered sequence of event-data values can be disclosed by, for example, vector generated for the letter ‘D’ in DownloadFile (’http://d*c*a*ci*x*.<domain>]
)

determining a first indicator at least partly by applying the first representation vector to the trained classifier; and 
(See Hendler Page 12, left column, bottom paragraph
Hendler Figure 4a depicts an example of how such a host name is encoded in the input to the neural network.[ Figure 4a Each row is a vector in the figure]
Note the pattern of alternating zeros and ones
[first indicator can be the result of the application of the neural network  filter to the respective column vector of Figure 4a; each character of the command has a corresponding column vector, the pattern of the column vector is detected by the neural network using the respective row filter ; 
]  
in the row corresponding to the symbol ‘*’. Figure 4b depicts a neural network filter [trained classifier ]of size 3 that is able to detect occurrences of this pattern. [ applying the first representation vector to the trained classifier; The filter is applied to the vectors as part of the neural network application to detect patterns in the command, these patterns indicate malicious commands]
)

determining that the first event-data value is associated with a security violation based at least in part on the first indicator satisfying the indicator-security criterion.  
(See Hendler 
page 12, left column, bottom paragraph 
When this filter is
applied to the characters sequence depicted in Figure
4a, it creates a relatively strong signal. 
[first indicator can be the result of the application of the Figure 4b  neural network  filter to the respective column vector of Figure 4a; e.g., neural network filter applied to a column vector in the vector matrix for http://d*c*a*ci*x*.<domain>]
page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters[indicator-security criterion is disclosed by detecting a sequence of alternating digits and characters which indicates malicious activity]. In most cases, this sequence represented the name of the host or domain from which the command
 [the first event-data value can be disclosed by, for example, the letter ‘D’ in DownloadFile (’http://d*c*a*ci*x*.<domain>. the system applying the filter will detect the alternating asterisks (representing digits) in the command and determine this is malicious command; the command can also be  detected as a pattern that incorporates “random characters” (page 12, right column, 2nd paragraph) and is therefore scored above a threshold, thereby indicating malicious activity, and in this case indicator-security criterion is detecting random characters  ]
downloaded (most probably malicious[security violation; maliciousness is a probability determination]) content.
[See the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated by a domain generation algorithm (DGA) [49] used by the malware
[most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that the domain name was generated by malware ]  
]

Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to
detect patterns that incorporate random characters
and/or casing that causes 4-CNN to assign these command a score above the threshold,[ determining that the first event-data value is associated with a security violation n] unlike the 3-gram detector.

 	However, Hendler does not expressly disclose 
A system comprising: 
at least one computer-readable memory storing a trained representation mapping, a trained classifier, and an indicator-security criterion; 
a communications interface configured to receive an event record, wherein the event record: 
a control unit configured to perform operations comprising: 
[Hendler describes processes performed by computers but does not describe details of the computer system architecture such as processors, memory etc.]
a trained representation mapping
wherein the trained representation mapping is trained to predict, based on an individual event-data value, at least one predicted event- data value following the individual event-data value

Xu discloses 
A system comprising: 
(See Xu Para. [0161]
the security server 204 [A system = security server 204 ] processes the request for the webpage. At step 214, the security server 204 forwards valid requests for the webpage to the website server 206. 
)
computer-readable memory for storing data and code
a trained representation mapping
a communications interface configured to receive an event record, wherein the event record: 
a control unit configured to perform operations comprising:
(See Xu Para. [0152]
In subsequent examples, this trained behavior model is referred to as “Behavior2Vec model”. For example, an autoencoder may be trained[trained representation mapping] that includes a Behavior2Vec model as an encoder and a Vec2Behavior model as a decoder. In some embodiments, the autoencoder architecture is based on recurrent neural networks (RNN) or convolutional neural networks (CNN) or other deep learning architecture. The trained Behavior2Vec model can then be used to generate a latent feature vector of fixed length for any sequence of input events, including behavior data that includes any sequence of input events generated at a client device in association with a request. 
Xu [0190]
Various forms of media may be involved in carrying one or more sequences of one or more instructions[at least one computer-readable memory storing] to processor 504 for execution. 
Xu Para. [0187]
Computer system 500 also includes one or more communication interfaces 518 coupled to bus 502.
[0173]
At block 404, the network security system 110 receives data describing a particular request from a particular client device to a server system hosting a website, [the request is the an event record] the data including particular behavior data generated at the particular client device in association with the particular request. The behavior data may be generated by behavior collection instructions executing at the particular client device.[ a communications interface configured to receive an event record, wherein the event record: 
]
Xu figure 5, processors 504 which is disclosing a control unit
Xu [0178]
special-purpose computing devices may be hard-wired to perform one or more techniques described herein, ….., the one or more special-purpose computing devices may include one or more general purpose hardware processors programmed to perform the techniques described herein[a control unit configured to perform operations comprising: ]
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Hendler with the technique for setting and applying computer architecture and technique for training an encoder to generate vectors from event data of Xu to include 
A system comprising: 
at least one computer-readable memory storing a trained representation mapping, a trained classifier, and an indicator-security criterion; 
a communications interface configured to receive an event record, wherein the event record: 
a control unit configured to perform operations comprising:
sequentially providing event-data values of the ordered sequence of event-data values to the trained representation mapping to determine respective representation vectors, wherein a first representation vector of the representation vectors is associated with a first event-data value of the ordered sequence of event-data values; 
One of ordinary skill in the art would have made this modification to improve the ability of the system to train a component to generate vectors from event data. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to use the hardware and software and use the computer components as taught in the Xu reference to train a vector generator to generate vectors from the event data. By training the system to generate a vector, the system can adapt as input data changes.

	However, the combination of Hendler, Xu, and Rei does not expressly disclose 
wherein the trained representation mapping is trained to predict, based on an individual event-data value, at least one predicted event- data value following the individual event-data value
Rei discloses 
wherein the trained representation mapping is trained to predict, based on an individual event-data value, at least one predicted event- data value following the individual event-data value
 (See Rei Para. [0056]
[event-data value = character]
The ANN 400 generates one or more predicted next items in a sequence of items based on an input sequence item, for example predicting the next word that a user may wish to include in a sentence based on the previous word that the user has input to the system. The following description is presented with respect to the specific embodiment of predicting the next word in a sequence of words, but it will be appreciated that the disclosure can be readily generalised to other sequences of items with no changes to the architecture of the ANN 400 by training the ANN 400 on different sets of data. For example, the same ANN 400 could be used to predict the next item in a sequence of items, for example: words, characters, logogram character strokes, e.g. Hanzi, morphemes, word segments, punctuation, emoticons, emoji, stickers, and hashtags, or optical character recognition or user intention prediction. For example, if the input to the ANN 400 is an operating system or software application event, the ANN 400 may generate a prediction of the next action that a user might wish to carry out
Rei Para.  [0058]
If the ANN 400 is used for a purpose other than predicting the next word in a sequence, the appropriate input to the ANN 400 will be represented by the 1-of-N vector 402 instead. For example, if the ANN 400 is used to predict the next morpheme, the 1-of-N vector 402 will represent the input morpheme. Similarly, if the ANN 400 is used to predict the next character, the 1-of-N vector 402 will represent the input character.
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for predicting the next character using the trained model of Rei to include 
wherein the trained representation mapping is trained to predict, based on an individual event-data value, at least one predicted event- data value following the individual event-data value
One of ordinary skill in the art would have made this modification to improve the ability of the system to predict the next character in order to detect malicious activity sooner. The system of the primary reference (e.g., classifier) can be modified to predict the next character and use that prediction to detect malicious activity.

As per claim 8, the rejection of claim 7 is incorporated herein. 
The combined teaching of Hendler, Xu, and Rei discloses 
the at least one computer-readable memory stores a tokenization criterion; and 
the operations further comprise: 
determining a second indicator at least partly by applying a second representation vector of the representation vectors to the trained classifier, 
wherein the second indicator is associated with a second event-data value that immediately follows the first event-data value in the ordered sequence of event-data values; and 	determining a token in the ordered sequence of event-data values based at least in part on the first indicator and the second indicator satisfying the tokenization criterion.  
 (See Hendler 
[second indicator is a result of applying the neural network filter to the 2nd character of the  command from claim 7
second event-data value is the 2nd character of the command
a second representation vector is a vector generated from the 2nd character of the command
ordered sequence of event-data values are the letters of the command

Page 10, left column, 3rd paragraph 
As for the usage of random names[a token= random names]
 (obfuscation method 11), these typically include numbers (converted
to the ‘*’ sign) [the first indicator and the second indicator satisfying the tokenization criterion; using the example from figure 4a, the first indicator is a result of applying the neural network filter to the column vector corresponding to the d(or the first indicator is simply the pattern of the column vector, such a pattern being determined when applying the classifier), 
second indicator is a result of applying the neural network filter to the * character (or simply the pattern of the column corresponding to the * next to the letter d)
the tokenization criterion can be disclosed by detecting a random name by detecting the numbers, such as detecting the multiple asterisks in the command *“..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”.
]
or alternating casing, and can
therefore be learnt by our classifiers as well. (As we
describe later, our deep learning classifiers do a better
job in learning such patterns.) The usage of special
strings such as ”[char]”, ”UTF8”, ”Base64” or the
character ’‘’ is also covered by both models as they
are retained in the input
)

As per claim 9, the rejection of claim 8 is incorporated herein. 
Hendler discloses
a token-security criterion; and 
the operations further comprise determining that the event is associated with a security violation based at least in part on the token satisfying the token-security criterion.   
(See Hendler page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters. In most cases, this sequence represented the name of the host or domain from which the command downloaded (most probably malicious[security violation; maliciousness is a probability determination]) content.
[See also the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated 
[most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that the domain name was generated by malware ]  by a domain generation algorithm (DGA) [49] used by the malware 
[token-security criterion= a sequence of alternating digits and
characters. Page 12, left column, 3rd paragraph
token= d*c*a*ci*x* ]

Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to
detect patterns that incorporate random characters
and/or casing that causes 4-CNN to assign these command a score above the threshold,[ determining that the event is associated with a security violation] unlike the 3-gram
detector.
) 
	However, Hendler does not expressly disclose 
the at least one computer-readable memory stores
Xu discloses
the at least one computer-readable memory stores (see claim 7)
For the reasons discussed with respect to claim 1, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Hendler with the computer-readable memory of Xu to include the at least one computer-readable memory stores

As per claim 11, the claim(s) is/are directed to a system with limitations which correspond to limitations of claim 6, and is/are rejected for the reasons detailed with respect to claim 6. 

Claim 3 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler in view of Xu, in view of Rei, further in view of Xie et al. China publication CN108809727A (hereinafter “Xie”) (machine translation).
As per claim 3, the rejection of claim 1 is incorporated herein. 
Hendler discloses receiving the event data from the monitored computing device via a network; 
(See Hendler commands sent by a web client (page 12, right column, 2nd paragraph) and commands collected from users in a corporate network (page 15, right column, bottom paragraph)
However, the combination of Hendler, Xu, and Rei does not expressly disclose 
in response to the determining that the event is associated with a security violation, 
transmitting a security command to the monitored computing device to cause the monitored computing device to perform a mitigation action.  
Xie discloses 
in response to the determining that the event is associated with a security violation, 
transmitting a security command to the monitored computing device to cause the monitored computing device to perform a mitigation action.  
 (See Xie page 4, paragraph 7 In the above scheme, builds multi-level defense system to the monitoring layer from the device layer to the transport layer and setting corresponding safety protection mechanism for each layer, so as to ensure the safe and stable operation of the direct current motor control system, specifically: a collecting unit collecting motor running data in the direct current motor control system, and sends it to the intrusion detection unit, encrypting the received motor operation data through the intrusion detection unit, and the motor operating data after encrypting to the monitoring unit. preventing transmission process device layer information be quickly decrypted, further used for network communication data received by the direct current motor control system for intrusion detection, if there is intrusion behaviour, [determining that the event is associated with a security violation, ]then alarming information is sent, [transmitting a security command to the monitored computing device ]such that by intrusion detecting unit mounted between the monitoring unit and the device can effectively prevent industrial bus communication network communication data interception and malicious tampering [cause the monitored computing device to perform a mitigation action]by monitoring unit receives the motor operation data after encrypting and decrypting it further used for sending control instruction to the DC motor control system for changing the operation state of the motor.[ cause the monitored computing device to perform a mitigation action.  ] so as to form a complete motor operation data and secure transmission link of the network communication data, can furthest ensure motor operation data and network communication data in the transmission process of the security, 
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for receiving system operation information and detecting intrusion detection and sending mitigation alarm and DC motor control instructions of Xie to include 
in response to the determining that the event is associated with a security violation, 
transmitting a security command to the monitored computing device to cause the monitored computing device to perform a mitigation action.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to mitigate detected malicious activity. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to send instructions to mitigate the malicious activity when detected.

Claims 4 and 13 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler in view of Xu, in view of Rei, further in view of Mamtani et al. U.S. Publication 20120088584 (hereinafter “Mamtani”).
As per claim 4, the rejection of claim 1 is incorporated herein. 
Hendler discloses
at least some text of a command line of a first process, the first process being a process that triggered the event; 
(See Hendler 
Hendler Page 15, right column, 3rd paragraph we targeted the detection of
individual PowerShell commands that are executed
via the command-line. 
Hendler page 12, left column, 3rd paragraph 
Out of the new 42 detected commands 15 commands
contain a sequence of alternating digits and characters.[ at least one character]

Page 3, right column, 1st paragraph 
The Get-Process
cmdlet, for instance, when given a name of a machine
which can be accessed in the context in which Power-
Shell is executed, returns the list of processes that are
running on that machine.
	).

	However, the combination of Hendler, Xu, and Rei does not expressly disclose 
at least some text of a command line of a second process that is a parent process of the first process; and 
at least some text of a command line of a third process that is a parent process of the second process.  
Mamtani discloses processing command-line text fields of a process, its parent process, and its grandparent processes to detect malicious activity.
(See Mamtani Para. [0044] In one embodiment, the virtualization and encryption service 412 includes a process and thread tracking module 405 for tracking all active processes/threads associated with an application 401. In particular, in one embodiment, when an application is initially loaded on the system, the process and thread tracking module 405 generates a map of the processes and threads and the hierarchy between the processes and threads which it stores as metadata. This includes, for each process, all of its child processes and its parent and grandparent processes. In one embodiment, a process is identified by the full path to the executable and the pre-computed SHA-256 hashing value. Subsequently, when an application is executed and attempts to load a series of processes/threads into memory, the process and thread tracking module 405 reads the metadata via the metadata manager 406 and compares the map with the requested processes/threads. If a particular process or thread is not found in the map, or is found at a different level of the process/thread hierarchy than that stored in the map or doesn't match the hash or the command line arguments do not match, then the process and thread tracking module 405 may prevent the process/thread from loading, may trigger an alert to be addressed by a system administrator, and/or may take automated corrective action. Generating a process/thread map for each application and comparing the map to requested processes/threads in this manner provides for additional security and makes it difficult for a hacker to compromise the system using an unauthorized processes or threads.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for processing command-line text fields of a process, its parent process, and its grandparent processes to detect malicious activity of Mamtani to include 
wherein the command-line text comprises: 
at least some text of a command line of a second process that is a parent process of the first process; and 
at least some text of a command line of a third process that is a parent process of the second process.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to detect malicious activity. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to examine the text of command lines of parent/grandparent processes.

As per claim 13, the rejection of claim 7 is incorporated herein. 
Hendler discloses
wherein the ordered sequence of event-data values comprises: 
at least one character of a command line of a first process, the first process being a process that triggered the event; 
 (See
Hendler Page 15, right column, 3rd paragraph we targeted the detection of
individual PowerShell commands that are executed
via the command-line. 
Hendler page 12, left column, 3rd paragraph 
Out of the new 42 detected commands 15 commands
contain a sequence of alternating digits and characters.[ at least one character]

Hendler Page 3, right column, 1st paragraph 
The Get-Process
cmdlet, for instance, when given a name of a machine
which can be accessed in the context in which Power-
Shell is executed, returns the list of processes that are
running on that machine.
	).

	However, the combination of Hendler, Xu, and Rei does not expressly disclose 
at least one character of a command line of a second process that is a parent process of the first process; and 
at least one character of a command line of a third process that is a parent process of the second process.  
Mamtani discloses processing command-line text fields of a process, its parent process, and its grandparent processes to detect malicious activity.
(See Mamtani Para. [0044] In one embodiment, the virtualization and encryption service 412 includes a process and thread tracking module 405 for tracking all active processes/threads associated with an application 401. In particular, in one embodiment, when an application is initially loaded on the system, the process and thread tracking module 405 generates a map of the processes and threads and the hierarchy between the processes and threads which it stores as metadata. This includes, for each process, all of its child processes and its parent and grandparent processes. In one embodiment, a process is identified by the full path to the executable and the pre-computed SHA-256 hashing value. Subsequently, when an application is executed and attempts to load a series of processes/threads into memory, the process and thread tracking module 405 reads the metadata via the metadata manager 406 and compares the map with the requested processes/threads. If a particular process or thread is not found in the map, or is found at a different level of the process/thread hierarchy than that stored in the map or doesn't match the hash or the command line arguments do not match, then the process and thread tracking module 405 may prevent the process/thread from loading, may trigger an alert to be addressed by a system administrator, and/or may take automated corrective action. Generating a process/thread map for each application and comparing the map to requested processes/threads in this manner provides for additional security and makes it difficult for a hacker to compromise the system using an unauthorized processes or threads.).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for processing command-line text fields of a process, its parent process, and its grandparent processes to detect malicious activity of Mamtani to include 
at least one character of a command line of a second process that is a parent process of the first process; and 
at least one character of a command line of a third process that is a parent process of the second process.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to detect malicious activity. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to examine the text of command lines of  parent/grandparent processes.

Claim 5 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler in view of Xu, in view of Rei, further in view of Oliner et al. U.S. Publication 20200090027 (hereinafter “Oliner”), further in view of Kuperman et al. U.S. Publication 20170244737  (hereinafter “Kuperman”).
As per claim 5, the rejection of claim 1 is incorporated herein. 
Hendler discloses training command-line text
(See Hendler Page 15, right column, 3rd paragraph we targeted the detection of
individual PowerShell commands that are executed
via the command-line. 
page 10, left column, 3rd paragraph 
As for the usage of random names obfuscation[obfuscation means that the malware is trying to obfuscate, see page 4, left column, 4th paragraph  “numerous ways of obfuscating Power-Shell commands,”] method 11), these typically include numbers (converted
to the ‘*’ sign) or alternating casing, and can therefore be learnt by our classifiers as well (As we describe later, our deep learning classifiers do a better job in learning such patterns.) 
).

However, Hendler does not expressly disclose 
determining the trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training command-line text, an immediately following character in the training command-line text, within a predetermined accuracy; and 
determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators associated with respective portions of the training command-line text.  
Xu discloses determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators
 supervised learning with labels (e.g., para. 78 [0078]
Machine learning techniques may be supervised or unsupervised. Some machine learning techniques fall in between strictly supervised learning and strictly unsupervised learning. In supervised learning, a computer is presented with example inputs that are labeled with additional information. For example, the inputs may be classified or labeled with desired outputs. The computer is trained to create a mapping from inputs to outputs. Typically, the additional information allows the system to compute errors produced by the model. The errors are used as feedback in the learning process to improve the model. Typically, human effort is involved in labeling the data, such as by collecting or generating the example inputs and outputs. Classification is an example of a supervised learning task. The training data may be labeled with classifications (e.g. behavior data generated by a “Human” user, behavior data generated by an “Automated” user). One or more supervised learning techniques can be used to generate a model that can be used to classify new data () 
[but Xu does not describe the limitations not disclosed in Hendler. ]

	The combination of Hendler, Xu, and Rei does not expressly disclose 
determining the trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training command-line text, an immediately following character in the training command-line text, within a predetermined accuracy; and 
determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators associated with respective portions of the training command-line text.  
Oliner discloses 
determining the trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training text, an immediately following character in the training command-line text, within a predetermined accuracy; and 
(See Oliner
[Oliner deep-learning engine 1830 performs as both a predictor and a classifier. Engine 1830 predicts the next character in a sequence of characters (para. 266, 267) and also classifies a character received as anomalous (para. 268) or not by comparing the received character to the predicted character or the expectation value calculated for the character (para. 267), 272,
Oliner figure 21, element 2110 “calculates predictions about the next character in the sequence”
Oliner Para. [0284]
At block 2110, the computer system calculates a statistical prediction of an incoming group of yet-to-be-processed one or more textual characters.	Oliner Para. [0279]
generation includes a selection of the next character of a new event based upon the greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the training corpus.
Oliner [0266]
During training, the untrained engine 1832 receives a sequence of textual characters[determining a trained representation mapping] of the example structured events of the training corpus 1840. While training, the untrained engine 1832 uses statistical predictions on what the next character in the sequence is most likely to occur[so that the trained representation mapping predicts, based on a character of training text, the immediately following character in the training text]. In other words, it learns to expect particular characters with a quantifiable statistical prediction value.
Oliner [0267]
While the sequence of input example events is received, the untrained engine 1832 makes a prediction about what the next character in the sequence will be. Using the previous sequence to train the untrained engine 1832, the untrained engine calculates a so-called expectation value for each possible or likely character to be the next character in the sequence. 
Oliner [0262]
Given a training corpus of a sequence of text (such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters [based on a character of training text ](such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable[predicts, based on a character of training text, the immediately following character in the training text, within a predetermined accuracy; within a predetermined accuracy =  maximum level achievable]
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for training a character predictor of Oliner to include 
determining the trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training command-line text, an immediately following character in the training command-line text, within a predetermined accuracy; and 
One of ordinary skill in the art would have made this modification to improve the ability of the system to train a predictor so that the predictor can predict the next character, in order to detect anomalies. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to train a character predictor according to the technique of the Oliner reference.

However, the combination of Hendler, Xu, Rei, and Oliner does not expressly disclose 
determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators associated with respective portions of the training command-line text.  
Kuperman discloses 
determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators associated with respective portions of the training command-line text.  
 (See Kuperman 
Para. [0049]
The model generator 209 processes the requests to generate (e.g., train) a model 205 for classifying unknown requests (e.g., the majority of requests) based on their attributes. For example, once the model generator 209 generates a model 205, the model may be implemented as a component of a WAF 110 at the proxy 205 for classifying requests received from clients that are not known to be malicious or non-malicious[supervised learning technique] (e.g., traffic sources not identified to the proxy 205 for training data collection by the attribute collector 207).[The training will adjust the parameters of the model]
Kuperman [0054]  Profile/Anomaly detection WAFs differ from this approach in that they are unsupervised and the number of labeled positive examples is zero. Positive examples (e.g., malicious requests) may be utilized to verify a profile/anomaly detection WAF but are not considered in generating profiles themselves. In contrast, the model generator 209 ingests both positively labeled (e.g., known malicious requests) and negatively labeled (e.g., known non-malicious requests) training examples. [  classification training data comprising training indicators; training indicators disclosed by the labels indicating malicious or not          ]In addition, the requests collected by the attribute collector 207 may be specific to the web application 120 to which the requests are directed. Hence, the model generator 209 may train a model 205 for any number of web applications. [These web applications may include applications involving command-line, such as command-line interface]
Kuperman [0057]
The WAF 110, which includes the model 205, may be configured to receive requests from clients 101, 105 and classify the requests. Specifically, the model 205 may classify a request based on its associated attributes as either malicious (e.g., “1”) 
[when the requests come from an application that is a command line interface the request is a command ]
or non-malicious (e.g., “0”). The model 205 takes as input the attributes of the request, which may include the static attributes and/or one or more derived attributes, and predicts whether the request attributes when processed as a function of the attribute parameters (theta) implicates a malicious or non-malicious client. The model 205 may label the request based on the prediction, e.g., 1 for malicious or 0 for non-malicious, and the WAF 110 takes appropriate action in response.
Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, ……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, Rei, and Oliner with the technique for supervised learning to detect malicious command text of Kuperman to include 
determining the classifier at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators associated with respective portions of the training command-line text.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to train the system to detect malicious command text. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to use a supervised training technique to train the learning model for classifying the command line. 

Claim 10 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler in view of Xu, in view of Rei, further in view of Zhang et al. U.S. Publication 20180300623 (hereinafter “Zhang”).
As per claim 10, the rejection of claim 7 is incorporated herein. 
	However, the combination of Hendler, Xu, and Rei does not expressly disclose 
Zhang discloses retraining classifier after the classifier has performed a predetermined number of times
(See Zhang Para. [0055] determines whether C is greater than M, that is to say whether the classifier has performed more than M classification operations. When C is greater than M, block 746 retrains the classifier and resets the variable C to zero. 
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for retraining a classifier after processing a predetermined number of input of Zhang to include 
after providing a predetermined number of the event-data values to the trained representation mapping, resetting a state value of the trained representation mapping.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to ensure that the classifier is kept up-to-date and properly retrained as needed. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to retrain the vector generator model after a predetermined number of characters has been processed.


Claims 12 is/are rejected under 35 U.S.C. 103 as being unpatentable over Hendler in view of Xu, in view of Rei, further in view of Koval et al. U.S. Publication 20190260204 (hereinafter “Koval”).
As per claim 12, Hendler discloses commands classified as malicious will generate alerts sent to a monitoring and management system (page 15, left column, bottom paragraph and right column, top paragraph)
	However, the combination of Hendler, Xu, and Rei does not expressly disclose, but 
Koval discloses at least partly in response to the determining that the first event-data value is associated with a security violation, transmitting a security command to the monitored computing device to cause the monitored computing device to perform a mitigation action.   
(See Koval Para. [0318] module 1104 detects a malicious user by analyzing images transmitted from the IED or meter using facial and/or Iris recognition to determine if the user is authorized or not. ……. action module 1106 may use the detected securing threat or detected probing to send one or more alerts or notifications to one or more clients and/or send one or more control signals to IEDs, facilities (e.g., to shut off a compromised port, network, IED, etc.) to prevent the threat or probe detected.
[0050] intelligent electronic devices (“IEDs”) can be any device 
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Hendler, Xu, and Rei with the technique for sending a command to shut off a compromised intelligent electronic device of Koval to include 
at least partly in response to the determining that the first event-data value is associated with a security violation, transmitting a security command to the monitored computing device to cause the monitored computing device to perform a mitigation action.   
One of ordinary skill in the art would have made this modification to improve the ability of the system to mitigate the malicious activity detected at a device. The system (e.g., a computer implementing the classifier and vector generator, page 8, right column, 2nd paragraph and page 9, left column top paragraph) of the primary reference can be modified to send in the command to shut off a compromised electronic device, as taught in the reference.

Claims 14-16 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oliner in view of Kuperman.
As per claim 14, Oliner discloses 
At least one tangible, non-transitory computer-readable medium having stored thereon instructions executable by at least one processor to cause the at least one processor to perform operations comprising: 
(See Oliner Para. [0233] As depicted, the AutoDataGen 1820 includes one or more processors 1822, one or more secondary storage systems 1824 (e.g., hard drives, flash fmemory), and one or more primary memories 182.6 …….. storage systems 1824 and memories 1826 are examples of non-transitory computer-readable media that are capable of storing processor-executable (or computer-executable) instructions thereon.
Oliner [0304] The term “computer-readable media” is non-transitory computer-storage media. For example, computer-storage media may include, but are not limited to, magnetic storage devices (e.g., hard disk, floppy disk, and magnetic strips)
Oliner [0066] The networked computer system 100 comprises one or more computing devices. These one or more computing devices comprise any combination of hardware and software configured to implement the various logical components described herein. For example, the one or more computing devices may include one or more memories that store instructions for implementing the various components described herein, one or more hardware processors configured to execute the instructions stored in the one or more memories, and various data repositories in the one or more memories for storing data structures utilized and manipulated by the various components.
Oliner [0302] In the context of software/firmware, the blocks represent instructions stored on one or more computer-readable storage media that, when executed by one or more processors, perform the recited operations.
)

determining a trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training text, the immediately following character in the training text, within a predetermined accuracy; and
(See Oliner
[Oliner deep-learning engine 1830 performs as both a predictor and a classifier. Engine 1830 predicts the next character in a sequence of characters (para. 266, 267) and also classifies a character received as anomalous (para. 268) or not by comparing the received character to the predicted character or the expectation value calculated for the character (para. 267), 272,
Oliner figure 21, element 2110 “calculates predictions about the next character in the sequence”
Oliner Para. [0284]
At block 2110, the computer system calculates a statistical prediction of an incoming group of yet-to-be-processed one or more textual characters.	Oliner Para. [0279]
generation includes a selection of the next character of a new event based upon the greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the training corpus.
Oliner [0266]
During training, the untrained engine 1832 receives a sequence of textual characters[determining a trained representation mapping] of the example structured events of the training corpus 1840. While training, the untrained engine 1832 uses statistical predictions on what the next character in the sequence is most likely to occur[so that the trained representation mapping predicts, based on a character of training text, the immediately following character in the training text]. In other words, it learns to expect particular characters with a quantifiable statistical prediction value.
Oliner [0267]
While the sequence of input example events is received, the untrained engine 1832 makes a prediction about what the next character in the sequence will be. Using the previous sequence to train the untrained engine 1832, the untrained engine calculates a so-called expectation value for each possible or likely character to be the next character in the sequence. 
Oliner [0262]
Given a training corpus of a sequence of text (such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters [based on a character of training text ](such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable[predicts, based on a character of training text, the immediately following character in the training text, within a predetermined accuracy; within a predetermined accuracy =  maximum level achievable]
)

 determining a classifier based at least in part on the trained representation mapping and at least partly by adjusting parameters of a second model structure
(See Oliner 
[based at least in part on the trained representation mapping is disclosed because the classification of a character as anomalous (para. 272) is based on the prediction of the next character from the trained predictor. the trained representation mapping = trained datagen 1834]
	Oliner [0272]
The same may occur after training. In that case, the trained datagen 1834[the trained representation mapping] notes when the next character in the incoming machine data 1850 has an anomaly and generates a notification accordingly. [Declaring the next character as anomalous is classifying the next character, which means that the engine 1830 is now performing as a classifier]
Oliner [0273]
The AutoDataGen 1820 may be designed to have manual or automatic adjustments [adjusting parameters of a second model structure ]concerning the likelihood that the trained datagen 1834 will generate wholly new events that are consistent with the content and structure of example events of the training corpus. This is called a “temperature” adjustment. The greater the temperature, the greater the likelihood of generating an example event that closely resembles content/structure of the example events of the training corpus. The lesser the temperature, the lower the likelihood of generating an example event that closely resembles content/structure of the example events of the training corpus.
Oliner [0267]
While the sequence of input example events is received, the untrained engine 1832 makes a prediction about what the next character in the sequence will be. Using the previous sequence to train the untrained engine 1832, the untrained engine calculates a so-called expectation value for each possible or likely character to be the next character in the sequence. Once the next character arrives, the untrained engine 1832 compares/contrasts the expectation value of the character that actually arrived next and the expectation value that the engine calculated for that character before it arrived. When the compare is great (e.g., falls outside an acceptable expectation range or above/below an acceptable expectation threshold), that character is declared as “surprising.”[When the engine is trained, the trained engine is represented by  trained datagen 1834 in figure 18; Declaring the character as surprising is classifying the character]
Oliner [0265]
The existing data-generation systems are not capable of detecting an anomaly in ostensibly unstructured dataset of machine data. An anomaly is a variance in the input data stream that exceeds some acceptable amount of deviation from the norm (i.e., standard, expectation, etc.). There are patterns in the content of machine data. This is true even if those patterns are not readily apparent to a human. One or more embodiments of the deep-learning engine 1830, described herein, is designed to detect anomalies in the input data sequence.
Oliner [0268]
The system generates an alert or notification to highlight the surprising character (or group of characters). This is classified as an anomaly. A human should examine the anomaly and determine if any corrective action needs to be taken. A notification may be a visual highlighting on a computer screen, an audible indicator, an automated electronic message sent to another device or account, or some combination thereof.
Oliner [0269]
In addition or in the alternative, the system may report the anomaly by generating what it expected to see along with an associated confidence or probability factor. Consider this example:
)
wherein the trained representation mapping is configured to receive a sequence of characters and for an individual character, output a respective representation vector including a predicted character following the individual character.
(See Oliner Para.
[0263] In some implementations, the RNN may use a standard softmax classifier (also commonly referred to as the cross-entropy loss) on every output vector (e.g., character) simultaneously. The RNN is trained with mini-batch Stochastic Gradient Descent.
Oliner Para. [0284]
At block 2110, the computer system calculates a statistical prediction of an incoming group of yet-to-be-processed one or more textual characters.	Oliner Para. [0279]
generation includes a selection of the next character of a new event based upon the greatest likelihood of the selected character appearing next in the generated sequence as determined by the calculated statistical predictions of the sequence of the textual characters of the training corpus.
Oliner [0266]
During training, the untrained engine 1832 receives a sequence of textual characters of the example structured events of the training corpus 1840. While training, the untrained engine 1832 uses statistical predictions on what the next character in the sequence is most likely to occur[so that the trained representation mapping predicts, based on a character of training text, the immediately following character in the training text]. In other words, it learns to expect particular characters with a quantifiable statistical prediction value.
Oliner [0267]
While the sequence of input example events is received, the untrained engine 1832 makes a prediction about what the next character in the sequence will be. Using the previous sequence to train the untrained engine 1832, the untrained engine calculates a so-called expectation value for each possible or likely character to be the next character in the sequence. Once the next character arrives, the untrained engine 1832 compares/contrasts the expectation value of the character that actually arrived next and the expectation value that the engine calculated for that character before it arrived. When the compare is great (e.g., falls outside an acceptable expectation range or above/below an acceptable expectation threshold), that character is declared as “surprising.”[When the engine is trained, the trained engine is represented by  trained datagen 1834 in figure 18; Declaring the character as surprising is classifying the character]
[0268]
The system generates an alert or notification to highlight the surprising character (or group of characters). This is classified as an anomaly. 
[0271]
In Table 2, the system reports what it expected to see, which is “hi moM!” Indeed, it expected the M after “mo” with a 98% probability (or confidence factor). Instead, the system actually saw “RTY” following “mo.” 
Oliner [0272]
The same may occur after training. In that case, the trained datagen 1834[the trained representation mapping] notes when the next character in the incoming machine data 1850 has an anomaly and generates a notification accordingly. [Declaring the next character as anomalous means that the trained model is predicting characters]
)

	However, Oliner does not expressly disclose 
command-line text 
determining a classifier based at least in part on the trained representation mapping and at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators indicating whether respective portions of the training command-line text are associated with security violations.  

Kuperman discloses  
command-line text 
determining a classifier based at least in part on the trained representation mapping and at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators indicating whether respective portions of the training command-line text are associated with security violations.  
(See Kuperman 
Para. [0049]
The model generator 209 processes the requests to generate (e.g., train) a model 205 for classifying unknown requests (e.g., the majority of requests) based on their attributes. For example, once the model generator 209 generates a model 205, the model may be implemented as a component of a WAF 110 at the proxy 205 for classifying requests received from clients that are not known to be malicious or non-malicious (e.g., traffic sources not identified to the proxy 205 for training data collection by the attribute collector 207).
Kuperman [0054]  Profile/Anomaly detection WAFs differ from this approach in that they are unsupervised and the number of labeled positive examples is zero. Positive examples (e.g., malicious requests) may be utilized to verify a profile/anomaly detection WAF but are not considered in generating profiles themselves. In contrast, the model generator 209 ingests both positively labeled (e.g., known malicious requests) and negatively labeled (e.g., known non-malicious requests) training examples. [  classification training data comprising training indicators; training indicators disclosed by the labels indicating malicious or not          ]In addition, the requests collected by the attribute collector 207 may be specific to the web application 120 to which the requests are directed. Hence, the model generator 209 may train a model 205 for any number of web applications. [These web applications may include applications involving command-line, such as command-line interface]
Kuperman [0057]
The WAF 110, which includes the model 205, may be configured to receive requests from clients 101, 105 and classify the requests. Specifically, the model 205 may classify a request based on its associated attributes as either malicious (e.g., “1”) 
[when the requests come from an application that is a command line interface the request is a command ]
or non-malicious (e.g., “0”). The model 205 takes as input the attributes of the request, which may include the static attributes and/or one or more derived attributes, and predicts whether the request attributes when processed as a function of the attribute parameters (theta) implicates a malicious or non-malicious client. The model 205 may label the request based on the prediction, e.g., 1 for malicious or 0 for non-malicious, and the WAF 110 takes appropriate action in response.
Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, ……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Oliner with the technique for supervised learning using labels indicating malicious requests of Kuperman to include 
determining a trained representation mapping at least partly by adjusting parameters of a first model structure so that the trained representation mapping predicts, based on a character of training command-line text, the immediately following character in the training command-line text, within a predetermined accuracy; and
determining a classifier based at least in part on the trained representation mapping and at least partly by adjusting parameters of a second model structure using a supervised learning technique based on classification training data comprising training indicators indicating whether respective portions of the training command-line text are associated with security violations.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to use supervised training to define the range/threshold (para. 288) of the Oliner base reference. The range/threshold (para. 288) of the Oliner base reference can be defined using supervised learning. That is, by training the learning model can determine the optimal range/threshold for the system to classify whether actually received characters are anomalous or not based on the difference from the expected character.
	


As per claim 15, the rejection of claim 14 is incorporated herein. 
The combined teaching of Oliner and Kuperman discloses 
the first model structure comprises:
 an encoder that receives a character of training command-line text; 
a first recurrent neural network (RNN) layer fed by the encoder; 
a second RNN layer fed by the first RNN layer; and 
a decoder that outputs a predicted character; and 
the trained representation mapping comprises the encoder, the first RNN layer, and the second RNN layer.  
(See Oliner 
Para. [0261]
A character-level RNN operates over a sequence of characters,[ receives a character of training command-line text; ] such the sequence of characters that make up the example structured events of the training corpus 1840. Sequences of textual characters are the input and the output of the character-level RNN.
Oliner [0262]
Given a training corpus of a sequence of text (such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters (such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time[decoder that outputs a predicted character]. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable. 
Oliner [0236]
The deep-learning engine 1830 is a component or module that is programmed to implement one of or a combination of deep-learning architectures. The topic of deep-learning and the various implementations of such architectures are discussed in greater detail later. However, in short, “deep learning” is a branch of machine learning based on a set of approaches that model high-level abstractions in data by using a deep graph with multiple processing layers[a first recurrent neural network (RNN) layer fed by the encoder; 
a second RNN layer fed by the first RNN layer], composed of multiple non-linear transformations
Oliner 0258]
Deep-learning architectures are based on distributed representations. The underlying assumption behind distributed representations is that observed data are generated by the interactions of factors organized in layers. Deep learning adds the assumption that these layers of factors correspond to levels of abstraction or composition. Varying numbers of layers and layer sizes can be used to provide different amounts of abstraction.
).

As per claim 16, the rejection of claim 14 is incorporated herein. 
Oliner discloses 
the operations for determining the classifier comprise, for at least one portion of a plurality of the portions of the training text: 
sequentially providing characters of that portion to the trained representation mapping; 
the sequentially providing the characters by the trained representation mapping; and (See Oliner Para. [0262]
Given a training corpus of a sequence of text [sequentially providing characters of that portion to the trained representation mapping ](such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters (such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time[decoder that outputs a predicted character]. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable. 
[0263]
In some implementations, the RNN may use a standard softmax classifier (also commonly referred to as the cross-entropy loss) on every output vector (e.g., character) [and output respective representation vectors ]simultaneously. The RNN is trained with mini-batch Stochastic Gradient Descent.
Oliner [0266]
During training, the untrained engine 1832 receives a sequence of textual characters [sequentially providing characters of that portion to the trained representation mapping; sequentially providing the characters by the trained representation mapping;  ] of the example structured events of the training corpus 1840. While training, the untrained engine 1832 uses statistical predictions on what the next character in the sequence is most likely to occur. In other words, it learns to expect particular characters with a quantifiable statistical prediction value.
Oliner Para. [0284]
At block 2110, the computer system calculates a statistical prediction of an incoming group of yet-to-be-processed one or more textual characters.
).

	However, Oliner does not expressly disclose 
training command-line text 
selecting one representation vector of the respective representation vectors provided in response to the sequentially providing the characters by the trained representation mapping; and 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  

Kuperman discloses training command-line text
(See Kuperman Para. Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, [training command-line text ]……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
 ).
 
selecting one representation vector of the respective representation vectors provided in response to the input 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  
(See Kuperman Para.  [0050] In various example embodiments described herein, the model generator 209 utilizes supervised learning for generating a model 205. Supervised Learning relates to a branch of Machine Learning and Artificial Intelligence that utilizes a set of data with labels associated to each data point. For example, in the case of ingested requests, a set of data with values may correspond to the attributes of a request,…. Data points, and thus the requests, may be represented as n-dimensional vectors of values (e.g., 1, 0 for True, False, real numbers for whole values, another vector etc.) where n corresponds to the number of values (e.g., attributes) being learned. In the above example, request 1 may be represented by a vector of [0, 5, 1, 0, etc.] and request 2 by [1, 0, 0, 1, etc.] where each request vector [representation vectors provided ]includes the values for all the attributes collected for each request. ……, request 1 may be represented by [0, 5, X1, etc.] ……. Model generator 209 takes these request vectors and generates a model 205 for classifying incoming requests, which may similarly be represented as vectors of their attribute values, as “malicious” or “non-malicious.”
Kuperman [0053]
In some example embodiments, the model generator 209 may utilize active learning to improve upon and update generated models 205 through additional training iterations. Active learning may begin as a supervised learning problem as described above. Active learning recognizes that once the model generator 209 trains a model 205, this initial model 205 may not initially achieve every desired prediction. An active learning process of the model generator 209 allows human domain experts to inspect ambiguous unlabeled data points (e.g., requests) to make a decision as to the classification (e.g., malicious or non-malicious) the model 205 should have determined for the data points. As an example, in active learning, the model 205 may output requests that it does not unambiguously classify for administrator review. The administrator may be prompted to select a classification for a request [selecting one representation vector of the respective representation vectors provided in response to the input; the requests are each represented as a vector according to para. 50; here they are retraining the trained classifier using a vector corresponding to a request which would adjust the parameters of the classifier; the training indicator is disclosed by the labeled classification provided by the administrator for request ]the model 205 did not unambiguously classify. In turn, the model generator 209 may ingest the now classified request along with the collection of requests collected by the attribute collector 207 to incrementally update the model 205. Depending on the embodiment, the model generator 209 may retrain on the newly classified and previous known requests and replace the initial model with the newly trained model 205. Embodiments utilizing active learning may incrementally raise accuracy to reduce false positives and false negatives by retraining models with the model generator 209 in instances where ambiguous classifications and/or false positives/false negatives are identified from the classifications provided by the current model 205.
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Oliner with the technique for incrementally improving a classifier model by additional training using labeled data of Kuperman to include 
the operations for determining the classifier comprise, for at least one portion of a plurality of the portions of the training command-line text: 
selecting one representation vector of the respective representation vectors provided in response to the sequentially providing the characters by the trained representation mapping; and 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to Improve the accuracy of the classifier model. The system of the primary reference can be modified to improve the accuracy of the classifier model by using labeled data to train the classifier, as taught in the Kuperman reference.
Claims 17-18 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oliner in view of Kuperman, further in view of Strom et al. U.S. Publication 20180309636 (hereinafter “Strom”).
As per claim 17, the rejection of claim 14 is incorporated herein. 
Oliner discloses 
the operations for determining the trained representation mapping comprising: 
sequentially providing characters of the training text to the first model structure to cause the first model structure to output respective predicted characters; 
(See Oliner [0262]
Given a training corpus of a sequence of text 
(such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters [based on a character of training text ](such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time[cause the first model structure to output respective predicted characters;]. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable
Oliner Para. [0267]
While the sequence of input example events is received, the untrained engine 1832 makes a prediction about what the next character in the sequence will be. Using the previous sequence to train the untrained engine 1832, the untrained engine calculates a so-called expectation value for each possible or likely character to be the next character in the sequence. Once the next character arrives, the untrained engine 1832 compares/contrasts the expectation value of the character that actually arrived next and the expectation value that the engine calculated for that character before it arrived. When the compare is great (e.g., falls outside an acceptable expectation range or above/below an acceptable expectation threshold), that character is declared as “surprising.”[When the engine is trained, the trained engine is represented by  trained datagen 1834 in figure 18]
)

adjusting the trained generator
(See Oliner Para. [0273]
The AutoDataGen 1820 may be designed to have manual or automatic adjustments concerning the likelihood that the trained datagen 1834 will generate wholly new events that are consistent with the content and structure of example events of the training corpus. This is called a “temperature” adjustment. The greater the temperature, the greater the likelihood of generating an example event that closely resembles content/structure of the example events of the training corpus. The lesser the temperature, the lower the likelihood of generating an example event that closely resembles content/structure of the example events of the training corpus.
) 

However, Oliner does not expressly disclose that the training text is training command-line text
Kuperman discloses training command-line text
(See Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, ……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
Para. [0049]
The model generator 209 may be configured to ingest a collection of known malicious requests and their associated attributes and known non-malicious requests and their associated attributes. The model generator 209 processes the requests to generate (e.g., train) a model 205 for classifying unknown requests (e.g., the majority of requests) based on their attributes. For example, once the model generator 209 generates a model 205, the model may be implemented as a component of a WAF 110 at the proxy 205 for classifying requests received from clients that are not known to be malicious or non-malicious (e.g., traffic sources not identified to the proxy 205 for training data collection by the attribute collector 207).
Kuperman [0054]  Profile/Anomaly detection WAFs differ from this approach in that they are unsupervised and the number of labeled positive examples is zero. Positive examples (e.g., malicious requests) may be utilized to verify a profile/anomaly detection WAF but are not considered in generating profiles themselves. In contrast, the model generator 209 ingests both positively labeled (e.g., known malicious requests) and negatively labeled (e.g., known non-malicious requests) training examples. [  classification training data comprising training indicators; training indicators disclosed by the labels indicating malicious or not          ]In addition, the requests collected by the attribute collector 207 may be specific to the web application 120 to which the requests are directed. Hence, the model generator 209 may train a model 205 for any number of web applications. [These web applications may include applications involving command-line, such as command-line interface]
).
For the reasons discussed with respect to claim 14, it would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Oliner with the technique for training using command-line text in application requests of Kuperman to include 
sequentially providing characters of the training command-line text to the first model structure to cause the first model structure to output respective predicted characters; 

	However, the combination of Oliner and Kuperman does not expressly disclose 
determining an error between one of the characters of the training command-line text and the corresponding predicted character; and 
updating the parameters of the first model structure based at least in part on the error.  
Strom discloses 
determining an error in the prediction and
updating the parameters of the first model structure based at least in part on the error.  
 (See Strom Para. [0104] As described below, the node-relaying classifier is trained by comparing the observed outputs of each selected training sample to the predicted node-relaying capacity generated by the node-relaying classifier. The “error” between these predicted and observed values are used to adjust weighted parameters over time to facilitate increasingly more accurate predictions—as the node-relaying classifier learns the relationships between the node metrics of parent nodes and their node-relaying performance with respect to their child nodes. In this manner, the node-relaying classifier can predict the node-relaying capacity of a specified prospective parent node even if that parent node does not currently have, or perhaps never had, any child nodes.
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Oliner and Kuperman with the technique for adjusting the parameters of a predictive model in response to detecting errors in prediction of Strom to include 
determining an error between one of the characters of the training command-line text and the corresponding predicted character; and 
updating the parameters of the first model structure based at least in part on the error.  
One of ordinary skill in the art would have made this modification to improve the the ability of the system to improve the accuracy of the predictive model. The system of the primary reference can be modified to adjust the parameters of the predictive model in response to detecting errors in the predictions.

As per claim 18, the rejection of claim 17 is incorporated herein. 
Oliner discloses 
the trained representation mapping is configured to receive a sequence of characters and output respective representation vectors;
and the operations for determining the classifier comprise, for at least one portion of a plurality of the portions of the training text: 
sequentially providing characters of that portion to the trained representation mapping; 
the sequentially providing the characters by the trained representation mapping; and (See Oliner Para. [0262]
Given a training corpus of a sequence of text [sequentially providing characters of that portion to the trained representation mapping ](such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters (such as the training corpus of example events). This allows a trained RNN to produce a new sequence of text one character at a time[decoder that outputs a predicted character]. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable. 
[0263]
RNN may use a standard softmax classifier (also commonly referred to as the cross-entropy loss) on every output vector (e.g., character) [and output respective representation vectors ]simultaneously. The RNN is trained with mini-batch Stochastic Gradient Descent.
Oliner [0266]
During training, the untrained engine 1832 receives a sequence of textual characters [sequentially providing characters of that portion to the trained representation mapping; sequentially providing the characters by the trained representation mapping;  ] of the example structured events of the training corpus 1840. While training, the untrained engine 1832 uses statistical predictions on what the next character in the sequence is most likely to occur. In other words, it learns to expect particular characters with a quantifiable statistical prediction value.
Oliner Para. [0284]
At block 2110, the computer system calculates a statistical prediction of an incoming group of yet-to-be-processed one or more textual characters.
).

	However, Oliner does not expressly disclose 
training command-line text 
selecting one representation vector of the respective representation vectors provided in response to the sequentially providing the characters by the trained representation mapping; and 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  

Kuperman discloses training command-line text
(See Kuperman Para. Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, [training command-line text ]……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
 ).
 
selecting one representation vector of the respective representation vectors provided in response to the input 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  
(See Kuperman Para.  [0050] data points, and thus the requests, may be represented as n-dimensional vectors of values (e.g., 1, 0 for True, False, real numbers for whole values, another vector etc.) where n corresponds to the number of values (e.g., attributes) being learned. In the above example, request 1 may be represented by a vector of [0, 5, 1, 0, etc.] and request 2 by [1, 0, 0, 1, etc.] where each request vector [representation vectors provided ]includes the values for all the attributes collected for each request. ……, request 1 may be represented by [0, 5, X1, etc.] ……. Model generator 209 takes these request vectors and generates a model 205 for classifying incoming requests, which may similarly be represented as vectors of their attribute values, as “malicious” or “non-malicious.”
Kuperman [0053]
…… model generator 209 may utilize active learning to improve upon and update generated models 205 through additional training iterations. ……... The administrator may be prompted to select a classification for a request [selecting one representation vector of the respective representation vectors provided in response to the input; the requests are each represented as a vector according to para. 50; here they are retraining the trained classifier using a vector corresponding to a request which would adjust the parameters of the classifier; the training indicator is disclosed by the labeled classification provided by the administrator for request ]the model 205 did not unambiguously classify. In turn, the model generator 209 may ingest the now classified request along with the collection of requests collected by the attribute collector 207 to incrementally update the model 205.[ adjusting the parameters] Depending on the embodiment, the model generator 209 may retrain on the newly classified and previous known requests and replace the initial model with the newly trained model 205. Embodiments utilizing active learning may incrementally raise accuracy to reduce false positives and false negatives by retraining models with the model generator 209 in instances where ambiguous classifications and/or false positives/false negatives are identified from the classifications provided by the current model 205.
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Oliner with the technique for incrementally improving a classifier model by additional training using labeled data of Kuperman to include 
the trained representation mapping is configured to receive a sequence of characters and output respective representation vectors; and the operations for determining the classifier comprise, for at least one portion of a plurality of the portions of the training command-line text: 
selecting one representation vector of the respective representation vectors provided in response to the sequentially providing the characters by the trained representation mapping; and 
adjusting the parameters of the second model structure based on the selected one representation vector and on the training indicator associated with that portion.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to Improve the accuracy of the classifier model. The system of the primary reference can be modified to improve the accuracy of the classifier model by using labeled data to train the classifier, as taught in the Kuperman reference.

Claims 19-20 is/are rejected under 35 U.S.C. 103 as being unpatentable over Oliner in view of Kuperman, further in view of Hendler.
As per claim 19, the rejection of claim 14 is incorporated herein. 
Oliner discloses 
receiving event data associated with an event at a monitored computing device, the event data comprising trial text and the trial text comprising a plurality of trial characters; and 
sequentially providing trial characters of the plurality of trial characters to the trained representation mapping to determine respective trial representation vectors; 
(See Oliner 
[the trained representation mapping, as inherited from claim 14, is a predictor of the next character, and therefore the respective trial representation vectors means vectors for predicted characters, and is not simply a transformation of input character to output vector.]
Oliner [ 0065 ] FIG . 1 illustrates a 'networked computer system 100 in which an embodiment may be implemented .
Oliner Para. [0262]
Given a training corpus of a sequence of text (such as natural language or events), the RNN may be trained to produce a character-level textual model. Once trained, the RNN produces a probability distribution of the next character in a new sequence given a sequence of previous characters (such as the training corpus of example events).[ receiving event data associated with an event at a monitored computing device, the event data comprising trial text and the trial text comprising a plurality of trial characters; and ] This allows a trained RNN to produce a new sequence of text one character at a time. The training process continues with new example events until the network converges and its predictions are eventually consistent with the training data in that next correct characters are predicted at a maximum level achievable
Oliner [0263]
In some implementations, the RNN may use a standard softmax classifier (also commonly referred to as the cross-entropy loss) on every output vector (e.g., character) simultaneously[determine respective trial representation vectors;]. The RNN is trained with mini-batch Stochastic Gradient Descent.
.) 
	However, Oliner does not expressly disclose 
trial command-line text 
determining, for each of the trial characters, a respective trial indicator at least partly by applying the respective trial representation vector to the trained classifier; 
locating at least one token in the trial command-line text based at least in part on the respective trial indicators of the trial characters in the trial command-line text; and 
determining that the event is associated with a security violation based at least in part on the at least one trial token satisfying a stored token-security criterion.  

Kuperman discloses trial command-line text 
(See Kuperman [0028]
a client …. include applications or frameworks for accessing online data. Example applications for accessing online data include applications such ….. command-line interfaces, widgets, mobile applications, utilities, etc. that request, send, and/or receive data over the network and may present received data to the user (e.g., as rendered web content, command-line text, ……., a user of a malicious client 105 may utilize an application 109 such as a browser and/or command-line interface for malicious purposes to access a web application 120 on a host 145 in order to exploit a vulnerability 121 of the web application
).
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified Oliner with the command-line text, of Kuperman to include 
receiving event data associated with an event at a monitored computing device, the event data comprising trial command-line text and the trial command-line text comprising a plurality of trial characters; and 
One of ordinary skill in the art would have made this modification to improve the ability of the system to receive a command-line text, such as, for example, via the command-line interface. The system of the primary reference can be modified to receive command-line text as taught in the Kuperman reference. This would allow the system to receive command-line text, so that the system may detect malicious activity from the command-line text.


	However, the combination of Oliner and Kuperman does not expressly disclose 
determining, for each of the trial characters, a respective trial indicator at least partly by applying the respective trial representation vector to the trained classifier; 
locating at least one token in the trial command-line text based at least in part on the respective trial indicators of the trial characters in the trial command-line text; and 
determining that the event is associated with a security violation based at least in part on the at least one trial token satisfying a stored token-security criterion.  

Hendler discloses 
determining, for each of the trial characters, a respective trial indicator at least partly by applying the respective trial representation vector to the trained classifier; 
locating at least one token in the trial command-line text based at least in part on the respective trial indicators of the trial characters in the trial command-line text; and 
 (See Hendler
page 10, left column, 3rd paragraph 
As for the usage of random names[ at least one token= random names]  (obfuscation[obfuscation means that the malware is trying to obfuscate, see page 4, left column, 4th paragraph  “numerous ways of obfuscating Power-Shell commands,”]
method 11), these typically include numbers (converted to the ‘*’ sign) or alternating casing, and can therefore be learnt by our classifiers as well.[ locating at least one token] (As we describe later, our deep learning classifiers do a better job in learning such patterns.) 

Hendler Page 12, left column, bottom paragraph
Hendler Figure 4a depicts an example of how such a host name is encoded in the input to the neural network.[ Figure 4a Each row is a vector in the figure]
Note the pattern of alternating zeros and ones
[respective trial indicators of the characters can be disclosed by the pattern of each column vector (e.g., where each 1 is located in each of the column vector) corresponding to the characters of the command, in figure 4a; 
respective trial indicator can be disclosed by the pattern of where the 1 is located in a column vector corresponding to a character of Figure 4a;
(or respective trial indicator can be the result of the application of the the Figure 4b  neural network  filter to the respective column vector of Figure 4a); each character of the command has a corresponding column vector, the pattern of the column vector is detected by the neural network using the neural network filter; when the neural network detects a  pattern of alternating 0 and 1 (page 12, bottom paragraph “alternating zeros and ones in the row”) that indicates there our multiple digits in the command, the asterisks represents digits, and the digits may form part of a hostname generated by malware (page 12, left hand column, 3rd paragraph “sequence of alternating digits and characters” “most probably malicious”]  
in the row corresponding to the symbol ‘*’. Figure 4b depicts
a neural network filter [trained classifier ]of size 3 that is able to
detect occurrences of this pattern. [ applying the respective representation vector to a trained classifier; The filter is applied to the vectors as part of the neural network application to detect patterns in the command, these patterns indicate malicious commands]
Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to detect patterns that incorporate random characters[locating at least one token in the command-line text; detected uppercase or lowercase patterns or detected random character patterns are part of random names that are used to obfuscate by malware, the names may disclose at least one token]
and/or casing [casing means uppercase/lowercase patterns] that causes 4-CNN to assign these command a score above the threshold,[ determining that the event is associated with a security violation] unlike the 3-gram detector.
)

determining that the event is associated with a security violation based at least in part on the at least one trial token satisfying a stored token-security criterion.  
(See Hendler page 12, left hand column, 3rd paragraph 
Out of the new 42 detected commands, 15 commands contain a sequence of alternating digits and characters[locating at least one token; the token can be any interesting sequence of characters in the command that indicates malicious activity]. In most cases, this sequence represented the name of the host or domain from which the command downloaded (most probably malicious[security violation; maliciousness is a probability determination]) content.
[See also the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated 
[most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that the domain name was generated by malware ]  by a domain generation algorithm (DGA) [49] used by the malware 
[stored token-security criterion= a sequence of alternating digits and
characters. Page 12, left column, 3rd paragraph
the at least one token= d*c*a*ci*x* ]

Hendler Page 12, right hand column, 2nd paragraph We note that in both the above cases, the
PoweShell commands may include additional indications
of maliciousness such as the web client or the
cmdlets they use. Nevertheless, it is the ability to
detect patterns that incorporate random characters
and/or casing that causes 4-CNN to assign these command a score above the threshold,[ determining that the event is associated with a security violation] unlike the 3-gram
detector.
)
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the claimed invention to have modified the combination of Oliner and Kuperman with the technique for determining that an event is associated with malicious activity based on tokens including sequences of digits and characters of Hendler to include 
determining, for each of the trial characters, a respective trial indicator at least partly by applying the respective trial representation vector to the trained classifier; 
locating at least one token in the trial command-line text based at least in part on the respective trial indicators of the trial characters in the trial command-line text; and 
determining that the event is associated with a security violation based at least in part on the at least one trial token satisfying a stored token-security criterion.  
One of ordinary skill in the art would have made this modification to improve the ability of the system to identify malicious events. The system of the primary reference can be modified to determine characteristics of character/digit patterns in sequences of digits and characters and determine that an event is malicious based on associated sequences of characters and digits, as taught in the Hendler reference.

As per claim 20, the rejection of claim 19 is incorporated herein. 
However, the combination of Oliner and Kuperman does not expressly disclose, but  Hendler discloses
wherein: the plurality of trial characters comprises at least one special trial character and at least one non-special trial character;
the operations for locating the at least one trial token comprise identifying a first sequence of adjacent trial characters of the plurality of trial characters beginning from a starting trial character of the plurality of trial characters until reaching a special trial character preceded by a first trial character,
(See Hendler
[applying the neural network filter of figure 4b to the command “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. , as illustrated in figure 4 on page 13]
)
 wherein: the respective trial indicator of the special trial character indicates that the special trial character is not associated with a security violation; and
the respective trial indicator of the first trial character indicates that the first trial character is not associated with a security violation.
 (See Hendler [See the example command on page 12 left column below 3rd paragraph “..DownloadFile (’http://d*c*a*ci*x*.<domain>’)..”. Each of these names appears only once and they are most probably generated [most probably generated means that whether it is malware or not is a probability-based assessment and it is more likely than not that  the domain name was generated by malware ]  by a domain generation algorithm (DGA) [49] used by the malware 
[the plurality of characters= DownloadFile (’http://d*c*a*ci*x*.<domain>’)
at least one special trial character = :
at least one non- special trial character;=t
a first sequence of adjacent trial characters = http://
a starting trial character= h

a special character preceded by a first character = p:
a special trial character = :
first trial character= p
Hendler Page 12, left column, 3rd paragraph
the at least one trial token= d*c*a*ci*x* ]
wherein: the respective trial indicator of the special trial character indicates that the special trial character is not associated with a security violation; and
the respective trial indicator of the first trial character indicates that the first trial character is not associated with a security violation.
 [ can be disclosed by the application of the neural network filter of Hendler figure 4b to the command DownloadFile (’http://d*c*a*ci*x*.<domain>’), which would indicate that http:// does not does not have a pattern of alternating ones and zeros from the respective column vectors and therefore does not include any alternating asterisks that may be indicative of a hostname generated by malware.]
)


 
Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to HOWARD H LOUIE whose telephone number is (571)272-0036.  The examiner can normally be reached on Monday-Friday 9 AM-5 PM EST.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jung W. Kim can be reached on 571-272-3804.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.






/HOWARD H. LOUIE/Examiner, Art Unit 2494                                                                                                                                                                                                        
/THEODORE C PARSONS/Primary Examiner, Art Unit 2494