DETAILED ACTION
This action is in response to the claims filed 02/22/2022 for application 15/909,372. Claims 1, 8, 15, and 18 are amended. Claims 1-20 are currently pending.
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/22/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:

2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-12 and 15-20 are rejected under 35 U.S.C. 103 as being unpatentable over Hardy et al. (DL4MD: A Deep Learning Framework for Intelligent Malware Detection, hereinafter "Hardy") in view of Kolosnjaji et al. (Deep Learning for Classification of Malware System Call Sequences, hereinafter "Kolosnjaji") and further in view of Saxe et al. (US 9690938 B1, hereinafter "Saxe") and further in view of Sai ("US 9864956 B1", cited by Applicant in the IDS filed 02/22/2022, hereinafter "Sai") .

Regarding claim 1, Hardy teaches A method for generating a classification of variable length source data [Abstract], the method comprising: 
receiving source data having a first variable length (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]);
extracting feature information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]);
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding. θ = {W, b} would correspond to a first set of parameters.]):
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]);
and processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data;
Kolosnjaji teaches providing the sequence of extracted information as input to a convolutional filter (“In order to maximize the utilization of the possibilities given by neural network methodology, we combine convolutional and recurrent layers in one neural network. Figure 2 depicts our neural network architecture. The convolutional part consists of convolution and a pooling layers. On the one hand, the convolutional layer serves for feature extraction out of raw one-hot vectors. Convolution captures the correlation between neighboring input vectors and produces new features. We use two convolution filters of size 3 × 60, which corresponds to 3- grams of instructions. As the results of convolution we take feature vectors of size 10 and 20 for the first and second convolution layer, for every input feature. After each convolutional layer we use max-pooling to reduce the dimensionality of data by a factor of two.” [pg. 6, § 2.5 Deep Neural Network, ¶1]); and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data (“Outputs of the convolutional part of our neural network are connected to the recurrent part. We forward each output of the convolutional filters as one vector. The resulting sequence is modeled using the LSTM cells. We use LSTM cells, as they are flexible in terms of training, even though the maximal sequence length was limited to 100 vectors. Using the recurrent layer we are able to explicitly model the sequential dependencies in the kernel API traces. Mean-pooling is used to extract features of highest importance from the LSTM output and reduce the complexity of further data processing.” [pg. 6, § 2.5 Deep Neural Network, ¶1; Examiner interprets the source data being input into a convolutional filter and being output to a recurrent layer implies a “transformation of the source data”. See further Figure 2.]);
Hardy and Kolosnjaji are both in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s teachings to input a sequence of extracted information into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji. One would have been motivated to make this modification in order to improve the classification of malware samples. [pg. 3, ¶1-2, Kolosnjaji]
However Hardy/Kolosnjaji fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections
Saxe teaches the source data comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections (“As another example, a file window can contain 500 bytes, the informational entropy calculator 110 can read the next 500 bytes of the file sample, shift 250 bytes in the file sample, and read the next 500 bytes. In this manner, the informational entropy calculator 110 can generate overlapping file windows (e.g., where at least some file windows share bytes), and/or can generate file windows which contain mutually-exclusive bytes (e.g., where each file window contains bytes which are not in other file windows). For another example, each window can include 1000 bytes and the window can move 100 bytes to capture the next 1000 byte window. In some implementations, the informational entropy calculator 110 can divide the file sample into a predetermined and/or dynamically determined number of file windows of varying and/or equivalent sizes.” [col 7, lines 34-49]);
Hardy, Kolosnjaji and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Kolosnjaji’s teachings by dividing the log files of Hardy/Kolosnjaji into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]
	Although Saxe teaches dividing the bytes into partially overlapping contiguous sections, the reference doesn’t go into details of dividing bytes into different size contiguous sections. 
	Sai teaches dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
Tobiyama, Hardy, Sai, and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Saxe’s teachings by dividing the bytes into different size sections as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]

Regarding claim 2, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 1, where Hardy further teaches wherein extracting feature information from the source data includes generating one or more intermediate sequences (“The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12).” [pg. 62, § 3 System architecture, feature extractor; Examiner is interpreting intermediate sequences would be equivalent to the API calls being converted to a set of 32-bit IDs.]).

Regarding claim 3, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 2, where Hardy further teaches wherein the sequence of extracted information is based, at least in part, on at least one of the one or more intermediate sequences (“
    PNG
    media_image4.png
    181
    496
    media_image4.png
    Greyscale
” [pg.63, § 4.1 Problem definition; Examiner is interpreting this a feature extraction step. The feature of files would be extracted to a feature vector which would equivalent to sequence of extracted information.])

Regarding claim 4, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 1, where Hardy further teaches wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output (“Figure 2 illustrates a one-layer AutoEncoder model with one input layer, one hidden layer, and one output layer.” [pg. 63, top right column; See Fig. 2 for fully connected layer.]).

Regarding claim 5, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 1, where Hardy further teaches wherein the decoder neural network [See pg.63, ¶4] is configured by (i) receiving the embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) the category associated with the source data, (c) the source data, or (d) combinations thereof (“
    PNG
    media_image5.png
    637
    586
    media_image5.png
    Greyscale
” [pg. 64, top left column; Algorithm I shows the training of the AutoEncoder which receives xi that corresponds to receiving an embedding of source data. Backpropagation would correspond to adjusting the first set and second set of parameters θ = {W, b} and θ’ = {W’, b’} (disclosed on pg. 63, ¶4). The algorithm repeats until the while loop condition has been passed which uses E(x,z) the training error between the input and output which corresponds to when the output has reached an acceptable threshold. The output would implicitly include a sequence of extracted information and the source data since it is trained by using the input xi]).

Regarding claim 6, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 1, where Hardy further teaches wherein the source data comprises an executable, an executable file, executable code, object code, bytecode, source code, command line code, command line data, a registry key, a registry key value, a file name, a domain name, a Uniform Resource Identifier, interpretable code, script code, a document, an image, an image file, a portable document format file, a word processing file, or a spreadsheet (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]).

Regarding claim 7, Hardy/Kolosnjaji/Saxe/Sai teaches The method of claim 1, where Hardy further teaches wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model (“In this section, using the same dataset described in Section 5.1, we conduct a comparison between our proposed deep learning framework (DL4MD) and other shallow learning based classification methods (i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Na¨ıve Bayes (NB), and Decision Tree (DT)) in malware detection. The results in Table IV, Figure 6 and Figure 7 show that our proposed deep learning framework (DL4MD) outperform ANN, SVM, NB, and DT in malware detection.” [pg. 65, § 5.3 Comparisons between deep learning and other shallow learning based classification methods, ¶1]).

Regarding claim 8, Hardy teaches A system for generating a classification of variable length source data [Abstract], the system comprising: 
one or more processors; and 
at least one non-transitory computer readable storage medium having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions (“All experiments are conducted in the environment: 64 Bit Windows 8.1 on an Intel (R) Core (TM) i7-4790 Processor (3.60GHz) with 16GB of RAM, using MySQL and C++.” [pg. 65, § 5.1 Experimental setup]) comprising:
receiving source data having a first variable length (“Resting on the analysis of Windows API calls, which can reflect the behavior of program code pieces [32] (e.g., the API “GetModuleFileNameA” in “Kernel32.DLL” can be used to retrieve the complete path of the file that contains the specified module of current process)” [pg. 63, § 4.1 Problem definition, ¶1]);
extracting information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]);
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding. θ = {W, b} would correspond to a first set of parameters.]):;
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]); and 
processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data;
Kolosnjaji teaches providing the sequence of extracted information as input to a convolutional filter (“In order to maximize the utilization of the possibilities given by neural network methodology, we combine convolutional and recurrent layers in one neural network. Figure 2 depicts our neural network architecture. The convolutional part consists of convolution and a pooling layers. On the one hand, the convolutional layer serves for feature extraction out of raw one-hot vectors. Convolution captures the correlation between neighboring input vectors and produces new features. We use two convolution filters of size 3 × 60, which corresponds to 3- grams of instructions. As the results of convolution we take feature vectors of size 10 and 20 for the first and second convolution layer, for every input feature. After each convolutional layer we use max-pooling to reduce the dimensionality of data by a factor of two.” [pg. 6, § 2.5 Deep Neural Network, ¶1]); and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data (“Outputs of the convolutional part of our neural network are connected to the recurrent part. We forward each output of the convolutional filters as one vector. The resulting sequence is modeled using the LSTM cells. We use LSTM cells, as they are flexible in terms of training, even though the maximal sequence length was limited to 100 vectors. Using the recurrent layer we are able to explicitly model the sequential dependencies in the kernel API traces. Mean-pooling is used to extract features of highest importance from the LSTM output and reduce the complexity of further data processing.” [pg. 6, § 2.5 Deep Neural Network, ¶1; Examiner interprets the source data being input into a convolutional filter and being output to a recurrent layer implies a “transformation of the source data”. See further Figure 2.]);
Hardy and Kolosnjaji are both in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s teachings to input a sequence of extracted information into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji. One would have been motivated to make this modification in order to improve the classification of malware samples. [pg. 3, ¶1-2, Kolosnjaji]
However Hardy/Kolosnjaji fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections
Saxe teaches the source data comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections (“As another example, a file window can contain 500 bytes, the informational entropy calculator 110 can read the next 500 bytes of the file sample, shift 250 bytes in the file sample, and read the next 500 bytes. In this manner, the informational entropy calculator 110 can generate overlapping file windows (e.g., where at least some file windows share bytes), and/or can generate file windows which contain mutually-exclusive bytes (e.g., where each file window contains bytes which are not in other file windows). For another example, each window can include 1000 bytes and the window can move 100 bytes to capture the next 1000 byte window. In some implementations, the informational entropy calculator 110 can divide the file sample into a predetermined and/or dynamically determined number of file windows of varying and/or equivalent sizes.” [col 7, lines 34-49]);
Hardy, Kolosnjaji and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Kolosnjaji’s teachings by dividing the log files of Hardy/Kolosnjaji into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]
	Although Saxe teaches dividing the bytes into partially overlapping contiguous sections, the reference doesn’t go into details of dividing bytes into different size contiguous sections. 
	Sai teaches dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
Tobiyama, Hardy, Sai, and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Saxe’s teachings by dividing the bytes into different size sections as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]

Regarding claim 10, Hardy/Kolosnjaji/Saxe teaches The system of claim 9, where Hardy further teaches wherein the embedding of the source data is based, at least in part, on the output of the fully connected layer (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1; The output of the fully connected layer would implicitly embed (i.e. encode) the source data since it is an embedding layer of the encoder neural network.]).

Regarding claim 11, Hardy/Kolosnjaji/Saxe/Sai teaches The system of claim 9, where Hardy further teaches wherein the output of the fully connected layer is provided as input to the decoder neural network (“The resulting hidden representation yi is then mapped back to a reconstructed d0-dimensional vector zi” [pg. 63, right column, ¶3; yi is equivalent to the output of the fully connected layer of the encoder neural network. Fig. 2 shows fully connected layer]).

Regarding claim 12, Hardy/Kolosnjaji/Saxe/Sai teaches The system of claim 9, where Hardy further teaches the fully connected layer (“In order to transform an input vector xi into a hidden representation vector yi, the encoder, a deterministic mapping fθ, is utilized.” [pg. 63, right column, ¶3]), and the output of the fully connected layer is the embedding of the source data (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1; The output of the fully connected layer would implicitly embed (i.e. encode) the source data since it is an embedding layer of the encoder neural network.]).
Kolosnjaji further teaches wherein an output of the recurrent neural network layer is provided as input to (“Using the recurrent layer we are able to explicitly model the sequential dependencies in the kernel API traces. Mean-pooling is used to extract features of highest importance from the LSTM output and reduce the complexity of further data processing.” [pg. 6, § 2.5 Deep Neural Network, ¶1]).
Hardy, Kolosnjaji, Saxe, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Hardy discloses an encoder neural network taking in a feature vector as an input xi, and Kolosnjaji discloses training a RNN layer for feature extraction which has an output layer that outputs a feature vector. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s/Saxe’s/Sai’s teachings to take a feature vector from the output of Kolosnjaji’s RNN and input it to the fully connected encoder neural network of Hardy. One would have been motivated to use a RNN for feature extraction before inputting the output into an encoder neural network as it is flexible in terms of training sequential data. [pg.6, § 2.5 Deep Neural Network, ¶1, Kolosnjaji]

Regarding claim 15, Hardy teaches A system for generating a classification of source data [Abstract], the source data having a first variable length, the system comprising: 
one or more processors; and 
a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions (“All experiments are conducted in the environment: 64 Bit Windows 8.1 on an Intel (R) Core (TM) i7-4790 Processor (3.60GHz) with 16GB of RAM, using MySQL and C++.” [pg. 65, § 5.1 Experimental setup]) comprising:
extracting information to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length (“
    PNG
    media_image1.png
    277
    518
    media_image1.png
    Greyscale
” [pg.63, § 4.1 Problem definition, ¶1; Hardy discloses extracting the files into a feature vector which corresponds to a sequence of extracted information having a second variable length.]) wherein extracting information generates one or more intermediate sequences (“The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12).” [pg. 62, § 3 System architecture, feature extractor; Examiner is interpreting intermediate sequences would be equivalent to the API calls being converted to a set of 32-bit IDs.]);
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising (“
    PNG
    media_image2.png
    222
    513
    media_image2.png
    Greyscale
” [pg. 63, right column, ¶3; encoding the source data would be equivalent to embedding. θ = {W, b} would correspond to a first set of parameters.]):;
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network (Hardy discloses using encoder’s hidden vector yi and using the decoder to reconstruct vector zi. [See pg. 63, right column, ¶3-4]), the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data (“
    PNG
    media_image3.png
    216
    505
    media_image3.png
    Greyscale
” [pg. 63, right column, ¶4; Input would be xi and θ’ = {W’, b’}. is a second set of parameters The output zi would correspond to approximation of source data after being generated with an encoder neural network trained with a decoder neural network.]); and 
processing at least the embedding of the source data with a classifier to generate a classification of the source data (“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2; Malware detection would be a form of classification. The source data would be embedded before it reaches the classifier at the top layer.]).
However Hardy fails to explicitly teach providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data;
Kolosnjaji teaches providing the sequence of extracted information as input to a convolutional filter (“In order to maximize the utilization of the possibilities given by neural network methodology, we combine convolutional and recurrent layers in one neural network. Figure 2 depicts our neural network architecture. The convolutional part consists of convolution and a pooling layers. On the one hand, the convolutional layer serves for feature extraction out of raw one-hot vectors. Convolution captures the correlation between neighboring input vectors and produces new features. We use two convolution filters of size 3 × 60, which corresponds to 3- grams of instructions. As the results of convolution we take feature vectors of size 10 and 20 for the first and second convolution layer, for every input feature. After each convolutional layer we use max-pooling to reduce the dimensionality of data by a factor of two.” [pg. 6, § 2.5 Deep Neural Network, ¶1]); and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data (“Outputs of the convolutional part of our neural network are connected to the recurrent part. We forward each output of the convolutional filters as one vector. The resulting sequence is modeled using the LSTM cells. We use LSTM cells, as they are flexible in terms of training, even though the maximal sequence length was limited to 100 vectors. Using the recurrent layer we are able to explicitly model the sequential dependencies in the kernel API traces. Mean-pooling is used to extract features of highest importance from the LSTM output and reduce the complexity of further data processing.” [pg. 6, § 2.5 Deep Neural Network, ¶1]);
Hardy and Kolosnjaji are both in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s teachings to input a sequence of extracted information into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji. One would have been motivated to make this modification in order to improve the classification of malware samples. [pg. 3, ¶1-2, Kolosnjaji]
However Hardy/Kolosnjaji fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections
Saxe teaches the source data comprising a sequence of bytes (“For each file window in the file sample 308, the informational entropy calculator 110 can calculate 310 a number of occurrences of each byte value and/or byte sequence observed in the file window” [col 7, lines 50-53; If a byte sequence is being observed, then it is implied that the file sample disclosed by Saxe comprises a sequence of bytes.]);
dividing the sequence of bytes into a plurality of contiguous sections, and partially overlapping a respective adjacent contiguous section; extracting feature information associated with each contiguous section of the plurality of contiguous sections (“As another example, a file window can contain 500 bytes, the informational entropy calculator 110 can read the next 500 bytes of the file sample, shift 250 bytes in the file sample, and read the next 500 bytes. In this manner, the informational entropy calculator 110 can generate overlapping file windows (e.g., where at least some file windows share bytes), and/or can generate file windows which contain mutually-exclusive bytes (e.g., where each file window contains bytes which are not in other file windows). For another example, each window can include 1000 bytes and the window can move 100 bytes to capture the next 1000 byte window. In some implementations, the informational entropy calculator 110 can divide the file sample into a predetermined and/or dynamically determined number of file windows of varying and/or equivalent sizes.” [col 7, lines 34-49]);
Hardy, Kolosnjaji and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s and Kolosnjaji’s teachings by dividing the log files of Hardy/Kolosnjaji into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]
	Although Saxe teaches dividing the bytes into partially overlapping contiguous sections, the reference doesn’t go into details of dividing bytes into different size contiguous sections. 
	Sai teaches dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
Tobiyama, Hardy, Sai, and Saxe are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Saxe’s teachings by dividing the bytes into different size sections as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]
Regarding claim 16, Hardy/Kolosnjaji/Saxe/Sai teaches the system of claim 15, where Kolosnjaji further teaches wherein the embedding of the source data is combined with additional data before processing at least the embedding of the source data to generate the classification (“We construct a neural network based on convolutional and recurrent network layers in order to obtain the best features for classification. This way we get a hierarchical feature extraction architecture that combines convolution of n-grams with full sequential modeling. Our evaluation results demonstrate that our approach outperforms previously used methods in malware classification, being able to achieve an average of 85.6% on precision and 89.4% on recall using this combined neural network architecture” [Abstract; See further pg. 6, 2.5 Deep Neural Network, ¶1: “Mean-pooling is used to extract features of highest importance from the LSTM output and reduce the complexity of further data processing”]).
Hardy teaches with the classifier (“(“To use the SAEs for malware detection, a classifier needs to be added on the top layer. In our application, the SAEs and the classifier comprise the entire deep architecture model for malware detection, which is illustrated in Figure 4. [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶2])
Hardy, Kolosnjaji, Saxe, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s/Saxe’s/Sai’s teachings by combining the embedding of the source data with additional data before processing the data for classification as taught by Kolosnjaji. One would have been motivated to make this modification in order to improve the classification of malware samples. [pg. 3, ¶1-2, Kolosnjaji]

Regarding claim 17, Hardy/Kolosnjaji/Saxe/Sai teaches the system of claim 15, where Hardy further teaches wherein the input of the decoder neural network includes at least one fully connected layer (“The resulting hidden representation yi is then mapped back to a reconstructed d0-dimensional vector zi in the input space, using the decoder gθ . [pg. 63, right column, ¶4; See Fig. 2 for at least one fully connected layer. yi is hidden layer representation input into the decoder.]).

Regarding claim 18, Hardy/Kolosnjaji/Saxe/Sai teaches the system of claim 15, where Hardy further teaches wherein extracting information comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation (“Typically, the number of hidden units is much less than number of visible (input/output) ones (d1 < d0). As a result, when passing data through such a network, it first compresses (encodes) input vector to “fit” in a smaller representation, and then tries to reconstruct (decode) it back. The task of training is to minimize an error or reconstruction (using Equation 5), i.e. find the most efficient compact representation (encoding) for input data (Equation 6).” [pg.63, right column, ¶5; corresponds to a compression operation]).
Saxe teaches associated with each section of the plurality of contiguous sections(“In some implementations, each input vector can be limited to 256-dimensions, and/or can be similarly limited to a predetermined dimension. Each input vector can be generated by a client device 202, and/or by a malware detection server 102 (e.g., via the informational entropy calculator 110, the threat model manager 118, and/or the threat analyzer 114). The deep neural network threat model can use any of the input vectors to determine whether or not the potentially-malicious sample file is malware (e.g., can combine each of the 256-dimension vectors into a 1024-dimension input vector, can use a portion of the 256-dimensional vectors as an input vector, and/or the like).” [col 22, lines 24-35; Saxe discloses determining potentially-malicious malware based off portions of the vectors.])
Hardy, Kolosnjaji, Saxe, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s/Kolosnjaji’s/Sai’s teachings by dividing the log files of Hardy/Kolosnjaji into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Regarding claim 19, Hardy/Kolosnjaji/Saxe/Sai teaches the system of claim 15, where Hardy further teaches wherein the encoder neural network includes at least one of a plurality of recurrent neural network layers or a plurality of fully connected layers (“
    PNG
    media_image6.png
    467
    679
    media_image6.png
    Greyscale
” [pg. 64 Fig. 3 shows AutoEncoders model having a plurality of fully connected layers]).

Regarding claim 20, Hardy/Kolosnjaji/Saxe/Sai teaches the system of claim 15, where Hardy further teaches wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers (“
    PNG
    media_image7.png
    387
    689
    media_image7.png
    Greyscale
” [pg. 63, top right column; Fig. 2 shows a decoder neural network with at least one fully connected layer. Additionally, Fig. 3 shows a stacked AutoEncoder model with a plurality of fully connected layers.]).

Claims 13 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over Hardy in view of Kolosnjaji and Saxe and Sai and further in view of Tobiyama et al. (Malware Detection with Deep Neural Network Using Process Behavior, hereinafter "Tobiyama").

Regarding claim 13, Hardy/Kolosnjaji/Saxe/Sai teaches The system of claim 8, where Hardy further teaches the decoder neural network [See pg. 63, right column, ¶4; Fig. 2]. 
However Hardy/Kolosnjaji/Saxe/Sai fails to explicitly teach includes a second recurrent neural network layer.
	Tobiyama teaches includes a second recurrent neural network layer (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1]).
Hardy, Kolosnjaji, Saxe, Sai and Tobiyama are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Although Hardy discloses a decoder neural network, the reference fails to explicitly teach the decoder neural network including a recurrent network layer. Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s/Kolosnjaji’s/Saxe’s/Sai’s teachings to substitute Tobiyama’s RNN with Hardy’s AutoEncoder to have a decoder neural network include a recurrent neural network layer. One would have been motivated to make this modification as deep learning models like AutoEncoders can achieve comparable or better performance than other learning architectures. [Hardy, pg. 63, § 4.1 Problem definition, ¶3]

Regarding claim 14, Hardy/Kolosnjaji/Saxe/Sai teaches The system of claim 8, 
where Saxe further teaches wherein dividing the sequence of bytes (“In some implementations, file samples can be divided into 256-byte (and/or a similar size) windows of data within the file sample. Dividing the file sample into file windows can involve reading a number of bytes equal to the size of a file window. For example, if the file windows are 256-byte file windows, the informational entropy calculator 110 can read the next 256 bytes of the file sample and process as a file window.” [col 7, lines 27-34; file samples divided into 256-byte windows of data corresponds to dividing sequence of bytes into a plurality of portions of bytes.).
However Hardy/Kolosnjaji/Saxe/Sai fails to explicitly teach further comprises performing a window operation on the source data, the window operation having a size and a stride.
Tobiyama teaches further comprises performing a window operation on the source data to divide the sequence of bytes into the plurality of portions of bytes, the window operation having a size and a stride (“Each pooling layer receives the output of the previous convolutional layer and reduced their size into 1/2 by Max-Pooling with stride of 2.” [pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2; This would implicitly be a window operation.]).
Hardy, Kolosnjaji, Saxe, Sai and Tobiyama are all in the same field of endeavor of malware detection using deep neural networks. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Kolosnjaji discloses deep learning for malware analysis. Saxe discloses dividing a potentially-malicious file sample into file windows and determining if the file sample is malicious or not. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Hardy’s/Kolosnjaji’s/Sai’s/Tobiyama’s teachings by dividing the log files into portions of bytes as taught by Saxe. One would have been motivated to make this modification in order to reduce the amount of time used to determine a malware threat. [col 1, § Background, Saxe]

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1-13 and 15-20 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 3-5, and 8 of copending Application No. 15909442 in view of Hardy (DL4MD: A Deep Learning Framework for Intelligent Malware Detection) and Kolosnjaji et al. (Deep Learning for Classification of Malware System Call Sequences).
Claim 14 is provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 15909442 in view of Tobiyama (Malware Detection with Deep Neural Network Using Process Behavior). 

Instant Application
App#15909442
Claim 1
Claim 1
A method for generating a classification of variable length source data, the method comprising:
A method for embedding variable length source data by a processor, the method comprising:
receiving source data having a first variable length, the source data comprising a sequence of bytes;
receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections and partially overlapping a respective adjacent contiguous section;
dividing the sequence of bytes into a plurality of partially overlapping contiguous sections having different sizes;
extracting feature information associated with each contiguous section of the plurality of contiguous section of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters; wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.

wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data;
the fully connected neural network including an input, an output, and a second set of parameters, wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm 


and processing at least the embedding of the source data with a classifier to generate a classification of the source data.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Feature information [pg.63, § 4.1 Problem definition, ¶1]
Encoder neural network [pg. 63, right column, ¶3]
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.

Kolosnjaji teaches these limitations:  
providing the sequence of extracted information as input to a convolutional filter [pg. 6, § 2.5 Deep neural Network, ¶1]; and 
providing an output of the convolutional filter as input to a recurrent neural network layer, that represents a transformation of the source data [pg. 6, § 2.5 Deep neural Network, ¶1];

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by inputting source data into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji. 
Claim 2
Claim 1
The method of claim 1, wherein extracting feature information from the source data includes generating one or more intermediate sequences.
extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Feature information [pg.63, § 4.1 Problem definition, ¶1]
generating one or more intermediate sequences. [pg. 62, § 3 System architecture, feature extractor]
It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating: The PE parser is used to extract the Windows API calls from each PE file. Through the API query database, the Windows API calls can be converted to a set of 32-bit global IDs representing the corresponding API functions (e.g., the API of “MAPI32.MAPIReadMail” is encoded as 0x00600F12) as taught by Hardy to add another method of extracting source data. 

Claim 3

Claim 1
The method of claim 2, wherein the sequence of extracted information is based, at least in part, on at least one of the one or more intermediate sequences.
extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
based, at least in part, on at least one of the one or more intermediate sequences [pg.63, § 4.1 Problem definition]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating: The feature of each file can be represented by a binary feature vector as taught by Hardy to add another step in the method of extracting source data.
Claim 4
Claim 1
The method of claim 1, wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output.
the fully connected neural network including an input, an output, and a second set of parameters, wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 5
Claim 1
The method of claim 1, wherein the decoder neural network is configured by (i) receiving the embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) the category associated with the source data, (c) the source data, or (d) combinations thereof.
wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.


wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:

decoder neural network is configured by (i) receiving the embedding of the source data [See pg. 63, ¶4] 

and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.  [pg. 64, top left column; See Algorithm 1]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 6
Claim 1
The method of claim 1, wherein the source data comprises an executable, an executable file, executable code, object code, bytecode, source code, command line code, command line data, a registry key, a registry key value, a file name, a domain name, a Uniform Resource Identifier, interpretable code, script code, a document, an image, an image file, a portable document format file, a word processing file, or a spreadsheet.
A method for embedding variable length source data by a processor, the method comprising: receiving source data having a first variable length, the source data comprising a sequence of bytes;
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations in [pg. 63, § 4.1 problem definition, ¶1]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the types of files disclosed by Hardy as source data to detect malware.
Claim 7

Claim 8
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
All limitations of claim 7 in the instant application are anticipated by claim 8 of the ‘442 application.
Claim 8
Claim 1
A system for embedding variable length source data, the system comprising: 

one or more processors; 

and at least one non-transitory computer readable storage medium having instructions therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising:
A method for embedding variable length source data by a processor, the method comprising:
receiving source data having a first variable length, the source data comprising a sequence of bytes;
receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections and partially overlapping a respective adjacent contiguous section;
dividing the sequence of bytes into a plurality of partially overlapping contiguous sections having different sizes;
extracting information associated with each contiguous section of the plurality of contiguous section of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters; wherein the recurrent neural network is configured by adjusting the first set of parameters of the recurrent neural network based, at least in part, on a machine learning algorithm.
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; and
the fully connected neural network including an input, an output, and a second set of parameters; wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
processing at least the embedding of the source data with a classifier to generate a classification of the source data
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claim 9 recites the same limitations as method claim 1. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Kolosnjaji teaches these limitations:  
providing the sequence of extracted information as input to a convolutional filter [pg. 6, § 2.5 Deep neural Network, ¶1]; and 
providing an output of the convolutional filter as input to a recurrent neural network layer, that represents a transformation of the source data [pg. 6, § 2.5 Deep neural Network, ¶1];

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by inputting source data into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji.
Claim 9
Claim 1
The system of claim 8, wherein the encoder neural network further includes a fully connected layer, the fully connected layer having an input and an output.
the fully connected neural network including an input, an output, and a second set of parameters; wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 10
Claim 1
The system of claim 9, wherein the embedding of the source data is based, at least in part, on the output of the fully connected layer.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
embedding of the source data is based, at least in part, on the output of the fully connected layer. (“The goal of an AutoEncoder is to encode a representation of the input layer into the hidden layer, which is then decoded into the output layer, yielding the same (or as close as possible) value as the input layer [4]. [pg. 63, §4.2 AutoEncoder, ¶1] 

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures. 
Claim 11
Claim 1
The system of claim 9, wherein the output of the fully connected layer is provided as input to the decoder neural network.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
the output of the fully connected layer is provided as input to the decoder neural network. [pg. 63, right column, ¶3]


It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures.
Claim 12
Claim 1
The system of claim 9, wherein an output of the recurrent neural network layer is provided as input to the fully connected layer, and the output of the fully connected layer is the embedding of the source data.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
wherein the output of the recurrent neural network layer is provided as input to the fully connected layer, and the output of the fully connected layer is the embedding of the source data. [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the AutoEncoder model as taught by Hardy as a substitute for the recurrent neural network to achieve better performance than other learning architectures.
Claim 13
Claim 4
The system of claim 8, wherein the decoder neural network includes a second recurrent neural network layer.
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.
Copending application 15909442 fails to recite the limitations in bold above. Claims 4 and 12 recite the same limitations. However, Hardy teaches these limitations:
Decoder neural network [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 14
Claim 1
The system of claim 8, wherein dividing the sequence of bytes further comprises performing a window operation on the source data to divide the sequence of bytes into the contiguous sections, the window operation having a size and a stride.
dividing the sequence of bytes into a plurality of partially overlapping contiguous sections having different sizes;

extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
Copending application 15909442 fails to recite the limitations in bold above. However, Tobiyama teaches these limitations:
wherein extracting information further comprises performing a window operation on the source data, the window operation having a size and a stride. pg. 580, § E. Training CNN and Perform Malware Process Detection, ¶2;

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating an extraction method as taught by Tobiyama to extract information from image files. 
Claim 15
Claim 1
A system for generating a classification of source data, the source data comprising a sequence of bytes and having a first variable length, the system comprising:
one or more processors; and a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising:
A method for embedding variable length source data by a processor, the method comprising:

receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections and partially overlapping a respective adjacent contiguous section;
dividing the sequence of bytes into a plurality of partially overlapping contiguous sections having different sizes;
extracting information associated with each contiguous section of the plurality of contiguous section of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
extracting, information associated with each section of the plurality of partially overlapping contiguous sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: providing the sequence of extracted information as input to a convolutional filter; and providing an output of the convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data;
and processing the sequence of extracted information with a recurrent neural network to generate an embedding of the source data, the recurrent neural network including an input, an output, and a first set of parameters;
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the embedding of the source data and a second set of parameters, the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) at least one of the one or more intermediate sequences, (c) a category associated with the source data, or (d) the source data;
the fully connected neural network including an input, an output, and a second set of parameters, and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
and processing at least the embedding of the source data with a classifier to generate a classification of the source data
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claims 9 and 15 are system claims which recites the same limitations as method claim 1 in copending application. However, Hardy teaches these limitations:
Encoder neural network [pg. 63, right column, ¶3]
wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network [pg. 63, right column ¶3-4]
the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) at least one of the one or more intermediate sequences, (c) a category associated with the source data, or (d) the source data; [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.

Kolosnjaji teaches these limitations:  
providing the sequence of extracted information as input to a convolutional filter [pg. 6, § 2.5 Deep neural Network, ¶1]; and 
providing an output of the convolutional filter as input to a recurrent neural network layer, that represents a transformation of the source data [pg. 6, § 2.5 Deep neural Network, ¶1];

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by inputting source data into a convolutional filter and outputting it to a recurrent neural network layer as taught by Kolosnjaji.
Claim 16
Claim 1
The system of claim 15, wherein the embedding of the source data is combined with additional data before processing at least the embedding of the source data with the classifier to generate the classification.
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data
Copending application 15909442 fails to recite the limitations in bold above. Claim 15 is the system claim of claim 1 of the copending application which recite the same limitations. However, Kolosnjaji teaches these limitations:
embedding of the source data is combined with additional data processing before processing [Abstract, pg. 6, § 2.5 Deep Neural Networks]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating an extraction method as taught by Kolosnjaji to extract information from files which implicitly performs the step of additional data processing. 
Claim 17
Claim 5
The system of claim 15, wherein the input of the decoder neural network includes at least one fully connected layer 
The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
a decoder neural network with at least one fully connected layer at its input. [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 18
Claim 3
The system of claim 15, wherein extracting information associated with each section of the plurality of contiguous sections comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
The method of claim 1, wherein extracting information comprises executing at least one of a Shannon entropy, convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
All limitations of claim 18 in the instant application are anticipated by claims 1 and 3 of the ‘442 application.
Claim 19
Claims 4, 5
The system of claim 15, wherein the encoder neural network includes at least one of a plurality of recurrent neural network layers or a plurality of fully connected layers.
Claim 4:
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.

Claim 5: The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
encoder neural network [pg. 63, right column, ¶3]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the encoder neural network as taught by Hardy as a substitute for the recurrent neural network.
Claim 20
Claim 4, 5
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
Claim 4:
The method of claim 1, wherein the recurrent neural network includes one or more recurrent neural network layers.

Claim 5: The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
Copending application 15909442 fails to recite the limitations in bold above. However, Hardy teaches these limitations:
a decoder neural network [pg. 63, right column, ¶4]

It would have been obvious before the effective filing date of the applicant’s invention to modify the teachings of co-pending application (15909442) by incorporating the decoder neural network as taught by Hardy as a substitute for the recurrent neural network.


This is a provisional nonstatutory double patenting rejection.
Response to Arguments
Applicant's arguments filed 02/22/2022 have been fully considered but they are not persuasive. 

Regarding the double patenting rejection, the double patenting rejection will be held in abeyance and reconsidered at the time allowable claims are identified. Thus, claims 1-13 and 15-20 are provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claims 1, 3-5, and 8 of copending Application No. 15909442 in view of Hardy (DL4MD: A Deep Learning Framework for Intelligent Malware Detection) and Kolosnjaji et al. (Deep Learning for Classification of Malware System Call Sequences) and Claim 14 is provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1 of copending Application No. 15909442 in view of Tobiyama (Malware Detection with Deep Neural Network Using Process Behavior).

Applicant’s arguments on pgs. 12-13 regarding that cited prior arts of Tobiyama/Hardy/Saxe failing to teach the amended limitation “dividing the sequence of bytes into a plurality of contiguous sections, each contiguous section being different in size from other contiguous sections and partially overlapping a respective adjacent contiguous section;” has been considered but are not persuasive. Saxe explicitly teaches that the sequence of bytes can be divided into a plurality of overlapping contiguous sections. [col 7, lines 34-49] The newly presented art of Sai is relied upon to teach that the sequence of bytes can be divided into a plurality of contiguous sections with different sizes. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.
Conclusion
Applicant's amendment necessitated the new grounds of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        
/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122