DETAILED ACTION
This action is in response to claims filed 06/23/2021 for application 15/909372 filed 03/01/2018. Claims 1-2, 5, 8, 12, 14-15, and 17-18 are amended. Claims 1-20 are pending. 
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 06/23/2021 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hardy et al. (DL4MD: A Deep Learning Framework for Intelligent Malware Detection, hereinafter "Hardy") in view of Tobiyama et al. (Malware Detection with Deep Neural Network Using Process Behavior, hereinafter "Tobiyama") and further in view of Saxe et al. (US 9690938 B1, hereinafter "Saxe").

Regarding claim 1, Hardy teaches A method for generating a classification of variable length source data [Abstract], the method comprising: 
receiving source data having a first variable length (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]);
extracting feature information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]);
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding.]), wherein the embedding of the source data represents a transformation of the source data (“As a result, when passing data through such a network, it first compresses (encodes) input vector to “fit” in a smaller representation” [pg.63, right column, ¶5; Encoding the input vector would be equivalent to embedding the source data and compressing the input would correspond to a transformation]);
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]);
and processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters.
Tobiyama teaches the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1; See Table II for a first set of parameters.])
Hardy and Tobiyama are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Although Hardy discloses an encoder neural network, the reference fails to explicitly teach the encoder neural network including a recurrent network layer. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Tobiyama’s RNN with Hardy’s AutoEncoder to have an encoder neural network include a recurrent neural network layer. One would have been motivated to make this modification as deep learning models like AutoEncoders can achieve comparable or better performance than other learning architectures. [Hardy, pg. 63, § 4.1 Problem definition, ¶3]
However the combination fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
associated with each portion of the plurality of portions of bytes of the source data
Saxe teaches the source data comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of portions of bytes (“In some implementations, file samples can be divided into 256-byte (and/or a similar size) windows of data within the file sample. Dividing the file sample into file windows can involve reading a number of bytes equal to the size of a file window. For example, if the file windows are 256-byte file windows, the informational entropy calculator 110 can read the next 256 bytes of the file sample and process as a file window.” [col 7, lines 27-34; file samples divided into 256-byte windows of data corresponds to dividing sequence of bytes into a plurality of portions of bytes.);
associated with each portion of the plurality of portions of bytes of the source data (“In some implementations, each input vector can be limited to 256-dimensions, and/or can be similarly limited to a predetermined dimension. Each input vector can be generated by a client device 202, and/or by a malware detection server 102 (e.g., via the informational entropy calculator 110, the threat model manager 118, and/or the threat analyzer 114). The deep neural network threat model can use any of the input vectors to determine whether or not the potentially-malicious sample file is malware (e.g., can combine each of the 256-dimension vectors into a 1024-dimension input vector, can use a portion of the 256-dimensional vectors as an input vector, and/or the like).” [col 22, lines 24-35; Saxe discloses determining potentially-malicious malware based off portions of the vectors.])
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Tobiyama’s teachings by dividing the log files of Hardy/Tobiyama into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 2, the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 1, where Hardy further teaches wherein extracting feature information from the source data includes generating one or more intermediate sequences (“The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12).” [pg. 62, § 3 System architecture, feature extractor; Examiner is interpreting intermediate sequences would be equivalent to the API calls being converted to a set of 32-bit IDs.]).

Regarding claim 3, the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 2, where Hardy further teaches wherein the sequence of extracted information is based, at least in part, on at least one of the one or more intermediate sequences (“
    PNG
    media_image4.png
    181
    496
    media_image4.png
    Greyscale
” [pg.63, § 4.1 Problem definition; Examiner is interpreting this a feature extraction step. The feature of files would be extracted to a feature vector which would equivalent to sequence of extracted information.])

Regarding claim 4, the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 1, where Hardy further teaches wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output (“Figure 2 illustrates a one-layer AutoEncoder model with one input layer, one hidden layer, and one output layer.” [pg. 63, top right column; See Fig. 2 for fully connected layer.]).

Regarding claim 5 the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 1, where Hardy further teaches wherein the decoder neural network [See pg.63, ¶4] is configured by (i) receiving the embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof (“
    PNG
    media_image5.png
    637
    586
    media_image5.png
    Greyscale
” [pg. 64, top left column; Algorithm I shows the training of the AutoEncoder which receives xi that corresponds to receiving an embedding of source data. Backpropagation would correspond to adjusting the first set and second set of parameters θ = {W, b} and θ’ = {W’, b’} (disclosed on pg. 63, ¶4). The algorithm repeats until the while loop condition has been passed which uses E(x,z) the training error between the input and output which corresponds to when the output has reached an acceptable threshold. The output would implicitly include a sequence of extracted information and the source data since it is trained by using the input xi]).

Regarding claim 6, the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 1, where Hardy further teaches wherein the source data comprises an executable, an executable file, executable code, object code, bytecode, source code, command line code, command line data, a registry key, a registry key value, a file name, a domain name, a Uniform Resource Identifier, interpretable code, script code, a document, an image, an image file, a portable document format file, a word processing file, or a spreadsheet (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]).

Regarding claim 7, the combination of Hardy, Tobiyama, and Saxe teaches The method of claim 1, where Hardy further teaches wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model (“In this section, using the same dataset described in Section 5.1, we conduct a comparison between our proposed deep learning framework (DL4MD) and other shallow learning based classification methods (i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Na¨ıve Bayes (NB), and Decision Tree (DT)) in malware detection. The results in Table IV, Figure 6 and Figure 7 show that our proposed deep learning framework (DL4MD) outperform ANN, SVM, NB, and DT in malware detection.” [pg. 65, § 5.3 Comparisons between deep learning and other shallow learning based classification methods, ¶1]).

Regarding claim 8, Hardy teaches A system for generating a classification of variable length source data [Abstract], the system comprising: 
one or more processors; and 
at least one non-transitory computer readable storage medium having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions (“All experiments are conducted in the environment: 64 Bit Windows 8.1 on an Intel (R) Core (TM) i7-4790 Processor (3.60GHz) with 16GB of RAM, using MySQL and C++.” [pg. 65, § 5.1 Experimental setup]) comprising:
receiving source data having a first variable length (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]);
extracting information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]);
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding.]), wherein the embedding of the source data represents a transformation of the source data (“As a result, when passing data through such a network, it first compresses (encodes) input vector to “fit” in a smaller representation” [pg.63, right column, ¶5; Encoding the input vector would be equivalent to embedding the source data and compressing the input would correspond to a transformation]);
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]); and 
processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters.
Tobiyama teaches the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1; See Table II for a first set of parameters.])
Hardy and Tobiyama are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Although Hardy discloses an encoder neural network, the reference fails to explicitly teach the encoder neural network including a recurrent network layer. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Tobiyama’s RNN with Hardy’s AutoEncoder to have an encoder neural network include a recurrent neural network layer. One would have been motivated to make this modification as deep learning models like AutoEncoders can achieve comparable or better performance than other learning architectures. [Hardy, pg. 63, § 4.1 Problem definition, ¶3]
However the combination fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
associated with each portion of the plurality of portions of bytes of the source data
Saxe teaches the source data comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of portions of bytes (“In some implementations, file samples can be divided into 256-byte (and/or a similar size) windows of data within the file sample. Dividing the file sample into file windows can involve reading a number of bytes equal to the size of a file window. For example, if the file windows are 256-byte file windows, the informational entropy calculator 110 can read the next 256 bytes of the file sample and process as a file window.” [col 7, lines 27-34; file samples divided into 256-byte windows of data corresponds to dividing sequence of bytes into a plurality of portions of bytes.);
associated with each portion of the plurality of portions of bytes of the source data (“In some implementations, each input vector can be limited to 256-dimensions, and/or can be similarly limited to a predetermined dimension. Each input vector can be generated by a client device 202, and/or by a malware detection server 102 (e.g., via the informational entropy calculator 110, the threat model manager 118, and/or the threat analyzer 114). The deep neural network threat model can use any of the input vectors to determine whether or not the potentially-malicious sample file is malware (e.g., can combine each of the 256-dimension vectors into a 1024-dimension input vector, can use a portion of the 256-dimensional vectors as an input vector, and/or the like).” [col 22, lines 24-35; Saxe discloses determining potentially-malicious malware based off portions of the vectors.])
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Tobiyama’s teachings by dividing the log files of Hardy/Tobiyama into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 9, the combination of Hardy, Tobiyama, and Saxe teaches The system of claim 8, where Hardy further teaches wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output (“Figure 2 illustrates a one-layer AutoEncoder model with one input layer, one hidden layer, and one output layer.” [pg. 63, top right column; See Fig. 2 for fully connected layer.]).

Regarding claim 10, the combination of Hardy, Tobiyama, and Saxe teaches The system of claim 9, where Hardy further teaches wherein the embedding of the source data is based, at least in part, on the output of the fully connected layer (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1; The output of the fully connected layer would implicitly embed (i.e. encode) the source data since it is an embedding layer of the encoder neural network.]).

Regarding claim 11, the combination of Hardy, Tobiyama, and Saxe teaches The system of claim 9, where Hardy further teaches wherein the output of the fully connected layer is provided as input to the decoder neural network (“The resulting hidden representation yi is then mapped back to a reconstructed d0-dimensional vector zi” [pg. 63, right column, ¶3; yi is equivalent to the output of the fully connected layer of the encoder neural network. Fig. 2 shows fully connected layer]).

Regarding claim 12, the combination of Hardy, Tobiyama, and Saxe teaches The system of claim 9, where Hardy further teaches the fully connected layer (“In order to transform an input vector xi into a hidden representation vector yi, the encoder, a deterministic mapping fθ, is utilized.” [pg. 63, right column, ¶3]), and the output of the fully connected layer is the embedding of the source data (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1; The output of the fully connected layer would implicitly embed (i.e. encode) the source data since it is an embedding layer of the encoder neural network.]).
However fails to explicitly teach wherein an output of the recurrent neural network layer is provided as input to. 
Tobiyama teaches wherein an output of the recurrent neural network layer is provided as input to (“Each 1-hot vector xt is sequentially inputted to the RNN and it outputs prediction yt.” [pg. 579, right column, ¶1; note: yt would be equivalent to a feature vector.]).
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Hardy discloses an encoder neural network taking in a feature vector as an input xi, and Tobiyama discloses training a RNN for feature extraction which has an output layer that outputs a feature vector yt. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Saxe’s teachings to take a feature vector from the output of Tobiyama’s RNN and input it to the fully connected encoder neural network of Hardy. One would have been motivated to use a RNN for feature extraction before inputting the output into an encoder neural network as it provides good training results in various fields that use sequential data. [Tobiyama, pg. 578, § B Deep Neural Network, ¶4]

Regarding claim 13, the combination of Hardy, Tobiyama, and Saxe teaches The system of claim 9, where Hardy further teaches the decoder neural network [See pg. 63, right column, ¶4; Fig. 2]. 
However fails to explicitly teach includes a recurrent neural network layer.
	Tobiyama teaches includes a recurrent neural network layer (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1]).
Hardy, Tobiyama, and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Although Hardy discloses a decoder neural network, the reference fails to explicitly teach the decoder neural network including a recurrent network layer. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Saxe’s teachings to substitute Tobiyama’s RNN with Hardy’s AutoEncoder to have a decoder neural network include a recurrent neural network layer. One would have been motivated to make this modification as deep learning models like AutoEncoders can achieve comparable or better performance than other learning architectures. [Hardy, pg. 63, § 4.1 Problem definition, ¶3]

Regarding claim 14, the combination of Hardy, Tobiyama and Saxe teaches The system of claim 8, where Tobiyama further teaches wherein extracting information further comprises performing a window operation on the source data, the window operation having a size and a stride (“Each pooling layer receives the output of the previous convolutional layer and reduced their size into 1/2 by Max-Pooling with stride of 2.” [pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2; This would implicitly be a window operation.]).
Saxe teaches to divide the sequence of bytes into the plurality of portions of bytes (“In some implementations, file samples can be divided into 256-byte (and/or a similar size) windows of data within the file sample. Dividing the file sample into file windows can involve reading a number of bytes equal to the size of a file window. For example, if the file windows are 256-byte file windows, the informational entropy calculator 110 can read the next 256 bytes of the file sample and process as a file window.” [col 7, lines 27-34; file samples divided into 256-byte windows of data corresponds to dividing sequence of bytes into a plurality of portions of bytes.).
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Tobiyama’s teachings by dividing the log files of Hardy/Tobiyama into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 15, Hardy teaches A system for generating a classification of source data [Abstract] by a processor, the source data having a first variable length, the system comprising: 
one or more processors; and
a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions (“All experiments are conducted in the environment: 64 Bit Windows 8.1 on an Intel (R) Core (TM) i7-4790 Processor (3.60GHz) with 16GB of RAM, using MySQL and C++.” [pg. 65, § 5.1 Experimental setup]) comprising:
extracting information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]), wherein extracting information generates one or more intermediate sequences (“The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12).” [pg. 62, § 3 System architecture, feature extractor; Examiner is interpreting intermediate sequences would be equivalent to the API calls being converted to a set of 32-bit IDs.]);
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding.]), 
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) at least one of the one or more intermediate sequences, (c) a category associated with the source data, or (d) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]); and 
processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters.
Tobiyama teaches the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1; See Table II for a first set of parameters.])
Hardy and Tobiyama are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Although Hardy discloses an encoder neural network, the reference fails to explicitly teach the encoder neural network including a recurrent network layer. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to substitute Tobiyama’s RNN with Hardy’s AutoEncoder to have an encoder neural network include a recurrent neural network layer. One would have been motivated to make this modification as deep learning models like AutoEncoders can achieve comparable or better performance than other learning architectures. [Hardy, pg. 63, § 4.1 Problem definition, ¶3]
However the combination fails to explicitly teach and comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
associated with each portion of the plurality of portions of bytes of the source data
Saxe teaches and comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of portions of bytes (“In some implementations, file samples can be divided into 256-byte (and/or a similar size) windows of data within the file sample. Dividing the file sample into file windows can involve reading a number of bytes equal to the size of a file window. For example, if the file windows are 256-byte file windows, the informational entropy calculator 110 can read the next 256 bytes of the file sample and process as a file window.” [col 7, lines 27-34; file samples divided into 256-byte windows of data corresponds to dividing sequence of bytes into a plurality of portions of bytes.);
associated with each portion of the plurality of portions of bytes of the source data (“In some implementations, each input vector can be limited to 256-dimensions, and/or can be similarly limited to a predetermined dimension. Each input vector can be generated by a client device 202, and/or by a malware detection server 102 (e.g., via the informational entropy calculator 110, the threat model manager 118, and/or the threat analyzer 114). The deep neural network threat model can use any of the input vectors to determine whether or not the potentially-malicious sample file is malware (e.g., can combine each of the 256-dimension vectors into a 1024-dimension input vector, can use a portion of the 256-dimensional vectors as an input vector, and/or the like).” [col 22, lines 24-35; Saxe discloses determining potentially-malicious malware based off portions of the vectors.])
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Tobiyama’s teachings by dividing the log files of Hardy/Tobiyama into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 16, the combination of Hardy, Tobiyama and Saxe teaches the system of claim 15, where Tobiyama further teaches wherein the embedding of the source data is combined with additional data processing before processing at least the embedding of the source data with the classifier to generate the classification (“In the training phase, we train the CNN by using feature images that are labeled malicious or benign. The structure of the CNN is shown in Fig. 4. The CNN consists of an input layer, two convolution-pooling layers, a fully-connected layer, and an output layer. The first convolutional layer filters the W0 × W0 × 1 input image with 10 kernels. The second convolutional layer filters the W1 ×W1 ×10 output of previous layer with 20 kernels. Each pooling layer receives the output of the previous convolutional layer and reduced their size into 1/2 by Max-Pooling with stride of 2. The dimension of the output layer is two because the CNN is binary classifier.” [pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2; Tobiyama discloses training a DNN as a process where a RNN is trained first and then a CNN is trained using that feature image (i.e. embedded source data). The above citation shows that additional filtering is done after the source data has been embedded by the RNN.]).

Regarding claim 17, the combination Hardy, Tobiyama and Saxe teaches the system of claim 15, where Hardy further teaches wherein the input of the decoder neural network includes at least one fully connected layer (“The resulting hidden representation yi is then mapped back to a reconstructed d0-dimensional vector zi in the input space, using the decoder gθ . [pg. 63, right column, ¶4; See Fig. 2 for at least one fully connected layer. yi is hidden layer representation input into the decoder.]).

Regarding claim 18, the combination of Hardy, Tobiyama and Saxe teaches the system of claim 15, where Hardy further teaches wherein extracting information comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation (“Typically, the number of hidden units is much less than number of visible (input/output) ones (d1 < d0). As a result, when passing data through such a network, it first compresses (encodes) input vector to “fit” in a smaller representation, and then tries to reconstruct (decode) it back. The task of training is to minimize an error or reconstruction (using Equation 5), i.e. find the most efficient compact representation (encoding) for input data (Equation 6).” [pg.63, right column, ¶5; corresponds to a compression operation]).
Saxe teaches associated with each portion of the plurality of portions of bytes of the source data (“In some implementations, each input vector can be limited to 256-dimensions, and/or can be similarly limited to a predetermined dimension. Each input vector can be generated by a client device 202, and/or by a malware detection server 102 (e.g., via the informational entropy calculator 110, the threat model manager 118, and/or the threat analyzer 114). The deep neural network threat model can use any of the input vectors to determine whether or not the potentially-malicious sample file is malware (e.g., can combine each of the 256-dimension vectors into a 1024-dimension input vector, can use a portion of the 256-dimensional vectors as an input vector, and/or the like).” [col 22, lines 24-35; Saxe discloses determining potentially-malicious malware based off portions of the vectors.])
Hardy, Tobiyama and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Tobiyama’s teachings by dividing the log files of Hardy/Tobiyama into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 19, the combination of Hardy, Tobiyama and Saxe teaches the system of claim 15, where Hardy further teaches wherein the encoder neural network includes at least one of a plurality of recurrent neural network layers or a plurality of fully connected layers (“
    PNG
    media_image6.png
    467
    679
    media_image6.png
    Greyscale
” [pg. 64 Fig. 3 shows AutoEncoders model having a plurality of fully connected layers]).

Regarding claim 20, the combination of Hardy, Tobiyama and Saxe teaches the system of claim 15, where Hardy further teaches wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers (“
    PNG
    media_image7.png
    387
    689
    media_image7.png
    Greyscale
” [pg. 63, top right column; Fig. 2 shows a decoder neural network with at least one fully connected layer. Additionally, Fig. 3 shows a stacked AutoEncoder model with a plurality of fully connected layers.]).

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-13, 15, 17-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3-5 and 8 of copending Application No. 15909442 in view of Hardy (DL4MD: A Deep Learning Framework for Intelligent Malware Detection). 
Claims 14 and 16 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 15909442 in view of Tobiyama (Malware Detection with Deep Neural Network Using Process Behavior). 


Instant Application
App#15909442
Claim 1
Claim 1
A method for generating a classification of variable length source data, the method comprising:
A method for embedding variable length source data by a processor, the method comprising:
receiving source data having a first variable length, the source data comprising a sequence of bytes;
receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
extracting feature information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data, the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters, wherein the embedding of the source data represents a transformation of the source data;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters; wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.

wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data;
the fully connected neural network including an input, an output, and a second set of parameters, wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm 


and processing at least the embedding of the source data with a classifier to generate a classification of the source data.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Feature information [pg.63, § 4.1 Problem definition, ¶1]
Encoder neural network [pg. 63, right column, ¶3]
wherein the embedding of the source data represents a transformation of the source data (“As a result, when passing data through such a network, it first compresses (encodes) input vector to “fit” in a smaller representation” [pg.63, right column, ¶5] 
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 2
Claim 1
The method of claim 1, wherein extracting feature information from the source data includes generating one or more intermediate sequences.
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Feature information [pg.63, § 4.1 Problem definition, ¶1]
generating one or more intermediate sequences. [pg. 62, § 3 System architecture, feature extractor]
It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating: The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12) as taught by Hardy to add another method of extracting source data. 

Claim 3

Claim 1
The method of claim 2, wherein the sequence of extracted information is based, at least in part, on at least one of the one or more intermediate sequences.
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
based, at least in part, on at least one of the one or more intermediate sequences [pg.63, § 4.1 Problem definition]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating: The feature of each file can be represented by a binary feature vector as taught by Hardy to add another step in the method of extracting source data.
Claim 4
Claim 1
The method of claim 1, wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output.
the fully connected neural network including an input, an output, and a second set of parameters, wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 5
Claim 1
The method of claim 1, wherein the decoder neural network is configured by (i) receiving the embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.


wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:

decoder neural network is configured by (i) receiving the embedding of the source data [See pg. 63, ¶4] 

and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.  [pg. 64, top left column; See Algorithm 1]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 6
Claim 1
The method of claim 1, wherein the source data comprises an executable, an executable file, executable code, object code, bytecode, source code, command line code, command line data, a registry key, a registry key value, a file name, a domain name, a Uniform Resource Identifier, interpretable code, script code, a document, an image, an image file, a portable document format file, a word processing file, or a spreadsheet.
A method for embedding variable length source data by a processor, the method comprising: receiving source data having a first variable length, the source data comprising a sequence of bytes;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations in [pg. 63, § 4.1 problem definition, ¶1]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the types of files disclosed by Hardy as source data to detect malware.
Claim 7

Claim 8
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
All limitations of claim 7 in the instant application are anticipated by claim 8 of the ‘442 application.
Claim 8
Claim 1
A system for embedding variable length source data by a processor, the system comprising: 

one or more processors; 

and at least one non-transitory computer readable storage medium having instructions therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising:
A method for embedding variable length source data by a processor, the method comprising:
receiving source data having a first variable length, the source data comprising a sequence of bytes;
receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data, the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters; wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; and
the fully connected neural network including an input, an output, and a second set of parameters; wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
processing at least the embedding of the source data with a classifier to generate a classification of the source data
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claim 9 recites the same limitations as method claim 1. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 9
Claim 1
The system of claim 8, wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output.
the fully connected neural network including an input, an output, and a second set of parameters; wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 10
Claim 1
The system of claim 9, wherein the embedding of the source data is based, at least in part, on the output of the fully connected layer.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
embedding of the source data is based, at least in part, on the output of the fully connected layer. (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1] 

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures. 
Claim 11
Claim 1
The system of claim 9, wherein the output of the fully connected layer is provided as input to the decoder neural network.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
the output of the fully connected layer is provided as input to the decoder neural network. [pg. 63, right column, ¶3]


It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures.
Claim 12
Claim 1
The system of claim 9, wherein an output of the recurrent neural network layer is provided as input to the fully connected layer, and the output of the fully connected layer is the embedding of the source data.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
wherein the output of the recurrent neural network layer is provided as input to the fully connected layer, and the output of the fully connected layer is the embedding of the source data. [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures.
Claim 13
Claim 4
The system of claim 9, wherein the decoder neural network includes a recurrent neural network layer.
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.
Copending application 15909442 fails to recite the limitations in bold above. Claims 4 and 12 recite the same limitations. However, Hardy teaches these limitations:
Decoder neural network [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 14
Claim 1
The system of claim 8, wherein extracting information further comprises performing a window operation on the source data to divide the sequence of bytes into the plurality of portions of bytes, the window operation having a size and a stride.
dividing the sequence of bytes into a plurality of portions of bytes; 

extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Tobiyama teaches these limitations:
wherein extracting information further comprises performing a window operation on the source data, the window operation having a size and a stride. pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2;

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating an extraction method as taught by Tobiyama to extract information from image files. 
Claim 15
Claim 1
A system for generating a classification of source data by a processor, the source data comprising a sequence of bytes and having a first variable length, the system comprising:
one or more processors; and a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising:
A method for embedding variable length source data by a processor, the method comprising:

receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
dividing the sequence of bytes into a plurality of portions of bytes;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length, wherein extracting information generates one or more intermediate sequences;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
processing the sequence of extracted information with an encoder neural network to generate an embedding of the source data, the encoder neural network including an input, an output, a recurrent neural network layer, and a first set of parameters;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters;
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) at least one of the one or more intermediate sequences, (c) a category associated with the source data, or (d) the source data;
the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adiusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
and processing at least the embedding of the source data with a classifier to generate a classification of the source data
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claims 9 and 15 are system claims which recites the same limitations as method claim 1 in copending application. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) at least one of the one or more intermediate sequences, (c) a category associated with the source data, or (d) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 16
Claim 1
The system of claim 15, wherein the embedding of the source data is combined with additional data processing before processing at least the embedding of the source data with the classifier to generate the classification.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claim 15 is the system claim of claim 1 of the copending application which recite the same limitations. However, Tobiyama teaches these limitations:
embedding of the source data is combined with additional data processing before processing pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2;

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating an extraction method as taught by Tobiyama to extract information from image files which implicitly performs the step of additional data processing. 
Claim 17
Claim 5
The system of claim 15, wherein the input of the decoder neural network includes at least one fully connected layer 
The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
a decoder neural network with at least one fully connected layer at its input. [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 18
Claim 3
The system of claim 15, wherein extracting information associated with each portion of the plurality of portions of bytes of the source data comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
The method of claim 1, wherein extracting information comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
All limitations of claim 18 in the instant application are anticipated by claim 17 of the ‘442 application.
Claim 19
Claims 4, 5
The system of claim 15, wherein the encoder neural network includes at least one of a plurality of recurrent neural network layers or a plurality of fully connected layers.
Claim 4:
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.

Claim 5: The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 20
Claim 4, 5
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
Claim 4:
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.

Claim 5: The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
a decoder neural network [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.


This is a provisional nonstatutory double patenting rejection



Response to Arguments
Applicant's arguments filed 06/23/2021 have been fully considered but they are not persuasive. 

Regarding the double patenting rejection, the rejection has been maintained. Applicant has failed to provide an argument in regards to the double patenting rejection. Thus, claims 1-13, 15, 17-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3- of copending Application No. 15909442 in view of Hardy (DL4MD: A Deep Learning Framework for Intelligent Malware Detection) and Claims 14 and 16 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 15909442 in view of Tobiyama (Malware Detection with Deep Neural Network Using Process Behavior). 

Regarding applicant’s arguments on pg. 11-12 with respect to claim 1 that the combination of Hardy and Tobiyama fails to disclose “dividing the Operations in the log file into a plurality of portions of bytes and extracting information from each portion” and “extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length” have been considered but are moot because the newly amended limitations are addressed by the new art presented by Saxe. The above reasoning applies to claims 8 and 15 as they recite similar limitations. 
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491.  The examiner can normally be reached on Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        



/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122