DETAILED ACTION
This action is in response to the claims filed 05/16/2022 for application 15/909,442. Claims 1, 4, 6, 9, 12, 14, 15, 18, and 20 have been amended. Claims 1, 3-9, 11-15, 17-23 are currently pending. 

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 05/16/2022 has been entered.
 Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3-9, 11-15, and 17-23 are rejected under 35 U.S.C. 103 as being unpatentable over Tobiyama et al. (Malware Detection with Deep Neural Network Using Process Behavior, hereinafter "Tobiyama") in view of Hardy et al. (DL4MD: A Deep Learning Framework for Intelligent Malware Detection, hereinafter "Hardy") and further in view of Sai ("US 9864956 B1", hereinafter "Sai") and further in view of Wang et a. ("Modeling Temporal Dynamics and Spatial Configurations of Actions Using Two-Stream Recurrent Neural Networks", hereinafter "Wang").

Regarding claim 1, Tobiyama teaches A method for embedding variable length source data by a processor, the method comprising: 
receiving source data having a first variable length (“In the RNN training phase, we used 44 malware process logs and 39 benign logs for training. We selected those files so that the total Operation length in files of malware and benign classes become almost same. Types of the Operation appeared in all files were 81.” [pg. 581, left column, ¶2; Log files selected for training with an operation length would correspond to a first variable length.]); 
processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data (“To generate a feature image, we first convert the Operations in the log file to 1-hot vectors same as Section III-C and input them to RNN sequentially. Let L be the length of Operations recorded in the log file. We extract the value of 3rd hidden layer h3 for every input and obtain series of feature vector {                        
                            
                                
                                    h
                                
                                
                                    1
                                
                                
                                    3
                                
                            
                             
                        
                    ,                        
                            
                                
                                    h
                                
                                
                                    2
                                
                                
                                    3
                                
                            
                        
                    ,...,                        
                            
                                
                                    h
                                
                                
                                    L
                                
                                
                                    3
                                
                            
                        
                    }. We designed feature classifier (Section III-E) to accept fixed size images so that we need to convert these series of vector to fixed length one because the length of Operations differs between log files.” [pg. 579-580, § D. Feature Extraction and Imaging, ¶2; note: embedding a source data would correspond to a fixed length representation of the extracted log files.]), the first recurrent neural network including an input, an output, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” note: See Table II for a first set of parameters. [pg. 579, § C. Training RNN, ¶1]); and 
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm (“The RNN is trained by repeatedly using log files. First, we choose one log file and convert Operations = {OP1,OP2,...,OPL } to 1-hot vectors = {x1,x2,...,xL }. Each 1-hot vector xt is sequentially inputted to the RNN and it outputs prediction yt . Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; a machine learning algorithm is implicit for training a RNN.),
However Tobiyama fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data;
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters,
	wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Hardy teaches processing the enhanced embedding of the source data with a classifier, the classifier comprising a fully connected neural network (See Fig. 4, pg. 64, top right column) to generate a classification of the source data (“Our deep learning framework for malware detection (short for DL4MD) is performed on the analysis of Windows API calls generated from the collected PE files. The system consists of two major components: feature extractor, and deep learning based classifier, as illustrated in Figure 1.” [pg. 62, §3 System architecture, ¶1; The classifier disclosed by Hardy would be performing the classification (i.e. Malware detection) of the source data.), the fully connected neural network including an input, an output, and a second set of parameters (“
    PNG
    media_image1.png
    451
    520
    media_image1.png
    Greyscale
” [pg. 64, top left column; Second set of parameters would correspond to θ = {W, b}]);
wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network (“Backpropagate the error through the net and update parameter set θ = {W, b};” pg. 64, top left column; Backpropagate and update would correspond to adjusting parameters]) based, at least in part, on a machine learning algorithm (See Algorithm 1, pg. 64, top left column).
Tobiyama and Hardy are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Tobiyama/Hardy fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
Sai teaches the source data comprising a sequence of bytes (“In FIG. 5, the feature extraction instructions 113 receive the binary file 310. The binary file 310 may include an executable file, such as one of the files 104 of FIGS. 1-3. The binary file 310 is divided into chunks via chunking instructions 401. For example, the binary file 310 may be divided into chunks of 256 bytes each.” [col 15, lines 42 - 47]);
dividing the sequence of bytes into a plurality of sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length (“In a particular implementation, the feature extraction instructions 113 include entropy calculation instructions 403. The entropy calculation instructions 403 may be configured to calculate an entropy (e.g., a Shannon entropy) for each of the chunks 402. For example, in FIG. 5, the binary file 310 is used to generate five chunks 402 and the entropy calculation instructions 403 generate data including five of entropy values 404. Entropy values may be calculated using Equation 1: 
    PNG
    media_image2.png
    78
    460
    media_image2.png
    Greyscale
 …In a particular implementation, each byte of each of the chunks 402 is represented by a pair of hexadecimal characters. There are 256 possible values for a pair of hexadecimal characters. Thus, in this implementation, the entropy values (H) range between zero and eight where the maximum entropy (eight) is reached when Pi takes a constant value of 1/256 (i.e., every byte is completely random). In other implementations, other ranges of entropy values may be used depending on the chunking, how data within each chunk is grouped (e.g., into two hexadecimal values in the example above), and the base of the logarithm that is used to calculate the entropy. [col 15, line 55 – col 16, line 11; Examiner is interpreting the chunk of bytes that were divided to be equivalent to a sequence of extracted information having a second variable length. Since the bytes were divided from a binary file (i.e. first variable length), it is implicit that it would be shorter than the first variable length.]);
Tobiyama, Hardy, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s and Hardy’s teachings by dividing the log files of Tobiyama/Hardy into portions of bytes and performing a Shannon Entropy operation on the portion of bytes as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]
Although, Tobiyama discloses using a RNN, the reference doesn’t go into details of processing the embedding with a second RNN, 
Wang teaches processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data (“In the first layer, we use a corresponding RNN to model the temporal movement of each body part based on its concatenated coordinates of joints at each time step. In the second layer, we concatenate the outputs of the RNN of different parts and adopt another RNN to model the movement of the whole body” [pg. 502, left col, ¶2; See also Fig. 3; processing the embedding with a 2nd RNN);
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Sai’s teachings to process the embedding with a second RNN as taught by Wang. One would have been motivated to make this modification in order to improve the accuracy of the processed source data. [pg. 504, § 5.4, Wang]

Regarding claim 3, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Tobiyama further teaches wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation (“In the convolution layer, features are extracted by convoluting filter to inputs.” [pg. 578. B. Deep Neural Network, ¶3; note: this corresponds to a convolution operation]).

Regarding claim 4, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Tobiyama further teaches wherein the first recurrent neural network includes one or more recurrent neural network layers (“We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1]).

Regarding claim 5, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Hardy further teaches wherein the fully connected neural network includes one or more fully connected layers (“More rigorously, with an SAE deep network with h layers, the first layer takes the input from the training dataset and is trained simply as an AutoEncoder. Then, after the kth hidden layer is obtained, its output is used as the input of the (k + 1)th hidden layer, which is trained similarly. Finally, the hth layer’s output is used as the output of the entire SAE model. In this manner, AutoEncoders can form a hierarchical stack. Figure 3 illustrates a SAEs model with h hidden layers.” [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 6, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Tobiyama further teaches wherein the first set of parameters of the first recurrent neural network are adjusted in response to training data (Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; Updating weights would correspond to adjusting a first set of parameters.]).
However Tobiyama fails to explicitly teach the second set of parameters of the fully connected neural network are adjusted in response to training data.
Hardy teaches the second set of parameters of the fully connected neural network are adjusted in response to training data (“Backpropagate the error through the net and update parameter set θ = {W, b};” [pg. 64, top left column; Backpropagate and updating the parameter teaches a second set of parameters are adjusted]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 7, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Hardy further teaches wherein the classification of the source data is at least one of whether the source data is malicious, adware, or good (“The dataset obtained from Comodo Cloud Security Center contains 50,000 file samples, where 22,500 are malware, 22,500 are benign files, and 5,000 are unknown (with the analysis by the anti-malware experts of Comodo Security Lab, 2,500 of them are labeled as malware and 2,500 of them are benign). In our experiments, those 45,000 file samples are used for training, while the 5,000 unknown files are used for testing. 9,649 Windows API calls are extracted from these 50,000 file samples, so all the file samples can be represented as binary feature vectors with 9,649- dimensions (described in Section 4.1). To quantitatively validate the experimental results, we use the performance measures shown in Table II.” [pg. 65, 5.1 Experimental setup, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 8, Tobiyama/Hardy/Sai/Wang teaches the method of claim 1, where Hardy further teaches wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model (“In this section, using the same dataset described in Section 5.1, we conduct a comparison between our proposed deep learning framework (DL4MD) and other shallow learning based classification methods (i.e., Artificial Neural Network (ANN), Support Vector Machine (SVM), Na¨ıve Bayes (NB), and Decision Tree (DT)) in malware detection. The results in Table IV, Figure 6 and Figure 7 show that our proposed deep learning framework (DL4MD) outperform ANN, SVM, NB, and DT in malware detection.” [pg. 65, § 5.3 Comparisons between deep learning and other shallow learning based classification methods, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 9, Tobiyama teaches A system for embedding variable length source data by a processor, the system comprising: 
one or more processors; and
at least one non-transitory computer readable storage medium having instructions therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising (“In this paper, we propose a new malware process detection method using process behavior to detect whether a terminal is infected or not. Our proposal uses two types of Deep Neural Network (DNN) to adapt different characteristic of individual operation flows.” [pg. 577, §1 Introduction ¶3; The terminal implicitly teaches a processor and memory]):
receiving source data having a first variable length (“In the RNN training phase, we used 44 malware process logs and 39 benign logs for training. We selected those files so that the total Operation length in files of malware and benign classes become almost same. Types of the Operation appeared in all files were 81.” [pg. 581, left column, ¶2; Log files selected for training with an operation length would correspond to a first variable length.]); 
processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data (“To generate a feature image, we first convert the Operations in the log file to 1-hot vectors same as Section III-C and input them to RNN sequentially. Let L be the length of Operations recorded in the log file. We extract the value of 3rd hidden layer h3 for every input and obtain series of feature vector {                        
                            
                                
                                    h
                                
                                
                                    1
                                
                                
                                    3
                                
                            
                             
                        
                    ,                        
                            
                                
                                    h
                                
                                
                                    2
                                
                                
                                    3
                                
                            
                        
                    ,...,                        
                            
                                
                                    h
                                
                                
                                    L
                                
                                
                                    3
                                
                            
                        
                    }. We designed feature classifier (Section III-E) to accept fixed size images so that we need to convert these series of vector to fixed length one because the length of Operations differs between log files.” [pg. 579-580, § D. Feature Extraction and Imaging, ¶2; note: embedding a source data would correspond to a fixed length representation of the extracted log files.]), the first recurrent neural network including an input, an output, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” note: See Table II for a first set of parameters. [pg. 579, § C. Training RNN, ¶1]); and 
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm (“The RNN is trained by repeatedly using log files. First, we choose one log file and convert Operations = {OP1,OP2,...,OPL } to 1-hot vectors = {x1,x2,...,xL }. Each 1-hot vector xt is sequentially inputted to the RNN and it outputs prediction yt . Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; a machine learning algorithm is implicit for training a RNN.),
However Tobiyama fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data;
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters,
	wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Hardy teaches processing the enhanced embedding of the source data with a classifier, the classifier comprising a fully connected neural network (See Fig. 4, pg. 64, top right column) to generate a classification of the source data (“Our deep learning framework for malware detection (short for DL4MD) is performed on the analysis of Windows API calls generated from the collected PE files. The system consists of two major components: feature extractor, and deep learning based classifier, as illustrated in Figure 1.” [pg. 62, §3 System architecture, ¶1; The classifier disclosed by Hardy would be performing the classification (i.e. Malware detection) of the source data.), the fully connected neural network including an input, an output, and a second set of parameters (“
    PNG
    media_image1.png
    451
    520
    media_image1.png
    Greyscale
” [pg. 64, top left column; Second set of parameters would correspond to θ = {W, b}]);
wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network (“Backpropagate the error through the net and update parameter set θ = {W, b};” pg. 64, top left column; Backpropagate and update would correspond to adjusting parameters]) based, at least in part, on a machine learning algorithm (See Algorithm 1, pg. 64, top left column).
Tobiyama and Hardy are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Tobiyama/Hardy fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
Sai teaches the source data comprising a sequence of bytes (“In FIG. 5, the feature extraction instructions 113 receive the binary file 310. The binary file 310 may include an executable file, such as one of the files 104 of FIGS. 1-3. The binary file 310 is divided into chunks via chunking instructions 401. For example, the binary file 310 may be divided into chunks of 256 bytes each.” [col 15, lines 42 - 47]);
dividing the sequence of bytes into a plurality of sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length (“In a particular implementation, the feature extraction instructions 113 include entropy calculation instructions 403. The entropy calculation instructions 403 may be configured to calculate an entropy (e.g., a Shannon entropy) for each of the chunks 402. For example, in FIG. 5, the binary file 310 is used to generate five chunks 402 and the entropy calculation instructions 403 generate data including five of entropy values 404. Entropy values may be calculated using Equation 1: 
    PNG
    media_image2.png
    78
    460
    media_image2.png
    Greyscale
 …In a particular implementation, each byte of each of the chunks 402 is represented by a pair of hexadecimal characters. There are 256 possible values for a pair of hexadecimal characters. Thus, in this implementation, the entropy values (H) range between zero and eight where the maximum entropy (eight) is reached when Pi takes a constant value of 1/256 (i.e., every byte is completely random). In other implementations, other ranges of entropy values may be used depending on the chunking, how data within each chunk is grouped (e.g., into two hexadecimal values in the example above), and the base of the logarithm that is used to calculate the entropy. [col 15, line 55 – col 16, line 11; Examiner is interpreting the chunk of bytes that were divided to be equivalent to a sequence of extracted information having a second variable length. Since the bytes were divided from a binary file (i.e. first variable length), it is implicit that it would be shorter than the first variable length.]);
Tobiyama, Hardy, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s and Hardy’s teachings by dividing the log files of Tobiyama/Hardy into portions of bytes and performing a Shannon Entropy operation on the portion of bytes as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]
Although, Tobiyama discloses using a RNN, the reference doesn’t go into details of processing the embedding with a second RNN, 
Wang teaches processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data (“In the first layer, we use a corresponding RNN to model the temporal movement of each body part based on its concatenated coordinates of joints at each time step. In the second layer, we concatenate the outputs of the RNN of different parts and adopt another RNN to model the movement of the whole body” [pg. 502, left col, ¶2; See also Fig. 3; processing the embedding with a 2nd RNN);
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Sai’s teachings to process the embedding with a second RNN as taught by Wang. One would have been motivated to make this modification in order to improve the accuracy of the processed source data. [pg. 504, § 5.4, Wang]

Regarding claim 11, Tobiyama/Hardy/Sai/Wang teaches the system of claim 9, where Tobiyama further teaches wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation (“In the convolution layer, features are extracted by convoluting filter to inputs.” [pg. 578. B. Deep Neural Network, ¶3; note: this corresponds to a convolution operation]).

Regarding claim 12, Tobiyama/Hardy/Sai/Wang teaches the system of claim 9, where Tobiyama further teaches wherein the first recurrent neural network includes one or more recurrent neural network layers (“We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1]).

Regarding claim 13, Tobiyama/Hardy/Sai/Wang teaches the system of claim 9, where Hardy further teaches wherein the fully connected neural network includes one or more fully connected layers (“More rigorously, with an SAE deep network with h layers, the first layer takes the input from the training dataset and is trained simply as an AutoEncoder. Then, after the kth hidden layer is obtained, its output is used as the input of the (k + 1)th hidden layer, which is trained similarly. Finally, the hth layer’s output is used as the output of the entire SAE model. In this manner, AutoEncoders can form a hierarchical stack. Figure 3 illustrates a SAEs model with h hidden layers.” [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Regarding claim 14, Tobiyama/Hardy/Sai/Wang teaches the system of claim 9, where Tobiyama further teaches wherein the first set of parameters of the first recurrent neural network are adjusted in response to training data (Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; Updating weights would correspond to adjusting a first set of parameters.]).
However Tobiyama fails to explicitly teach the second set of parameters of the fully connected neural network are adjusted in response to training data.
Hardy teaches the second set of parameters of the fully connected neural network are adjusted in response to training data (“Backpropagate the error through the net and update parameter set θ = {W, b};” [pg. 64, top left column; Backpropagate and updating the parameter teaches a second set of parameters are adjusted]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 15, Tobiyama teaches A system for embedding source data by a processor, the source data having a first variable length (“In the RNN training phase, we used 44 malware process logs and 39 benign logs for training. We selected those files so that the total Operation length in files of malware and benign classes become almost same. Types of the Operation appeared in all files were 81.” [pg. 581, left column, ¶2; Log files selected for training with an operation length would correspond to a first variable length.]), the system comprising:
one or more processors; and
a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processor to perform actions comprising (“In this paper, we propose a new malware process detection method using process behavior to detect whether a terminal is infected or not. Our proposal uses two types of Deep Neural Network (DNN) to adapt different characteristic of individual operation flows.” [pg. 577, §1 Introduction ¶3; The terminal implicitly teaches a processor and memory]);
receiving source data having a first variable length (“In the RNN training phase, we used 44 malware process logs and 39 benign logs for training. We selected those files so that the total Operation length in files of malware and benign classes become almost same. Types of the Operation appeared in all files were 81.” [pg. 581, left column, ¶2; Log files selected for training with an operation length would correspond to a first variable length.]); 
processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data (“To generate a feature image, we first convert the Operations in the log file to 1-hot vectors same as Section III-C and input them to RNN sequentially. Let L be the length of Operations recorded in the log file. We extract the value of 3rd hidden layer h3 for every input and obtain series of feature vector {                        
                            
                                
                                    h
                                
                                
                                    1
                                
                                
                                    3
                                
                            
                             
                        
                    ,                        
                            
                                
                                    h
                                
                                
                                    2
                                
                                
                                    3
                                
                            
                        
                    ,...,                        
                            
                                
                                    h
                                
                                
                                    L
                                
                                
                                    3
                                
                            
                        
                    }. We designed feature classifier (Section III-E) to accept fixed size images so that we need to convert these series of vector to fixed length one because the length of Operations differs between log files.” [pg. 579-580, § D. Feature Extraction and Imaging, ¶2; note: embedding a source data would correspond to a fixed length representation of the extracted log files.]), the first recurrent neural network including an input, an output, and a first set of parameters (“Based on the Operations, we construct behavioral language model. We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” note: See Table II for a first set of parameters. [pg. 579, § C. Training RNN, ¶1]); and 
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm (“The RNN is trained by repeatedly using log files. First, we choose one log file and convert Operations = {OP1,OP2,...,OPL } to 1-hot vectors = {x1,x2,...,xL }. Each 1-hot vector xt is sequentially inputted to the RNN and it outputs prediction yt . Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; a machine learning algorithm is implicit for training a RNN.),
However Tobiyama fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data;
processing the embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters,
	wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Hardy teaches processing the enhanced embedding of the source data with a classifier, the classifier comprising a fully connected neural network (See Fig. 4, pg. 64, top right column) to generate a classification of the source data (“Our deep learning framework for malware detection (short for DL4MD) is performed on the analysis of Windows API calls generated from the collected PE files. The system consists of two major components: feature extractor, and deep learning based classifier, as illustrated in Figure 1.” [pg. 62, §3 System architecture, ¶1; The classifier disclosed by Hardy would be performing the classification (i.e. Malware detection) of the source data.), the fully connected neural network including an input, an output, and a second set of parameters (“
    PNG
    media_image1.png
    451
    520
    media_image1.png
    Greyscale
” [pg. 64, top left column; Second set of parameters would correspond to θ = {W, b}]);
wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network (“Backpropagate the error through the net and update parameter set θ = {W, b};” pg. 64, top left column; Backpropagate and update would correspond to adjusting parameters]) based, at least in part, on a machine learning algorithm (See Algorithm 1, pg. 64, top left column).
Tobiyama and Hardy are both in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Tobiyama/Hardy fails to explicitly teach the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections; 
extracting, information associated with each section of the plurality sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
Sai teaches the source data comprising a sequence of bytes (“In FIG. 5, the feature extraction instructions 113 receive the binary file 310. The binary file 310 may include an executable file, such as one of the files 104 of FIGS. 1-3. The binary file 310 is divided into chunks via chunking instructions 401. For example, the binary file 310 may be divided into chunks of 256 bytes each.” [col 15, lines 42 - 47]);
dividing the sequence of bytes into a plurality of sections (“For example, the binary file 310 may be divided into chunks of 256 bytes each. In other examples, different chunk sizes may be used. When the binary file 312 has a length that is not divisible by 256 bytes without a remainder, the remainder is may be dropped.” [col 15, lines 46-50]);
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length (“In a particular implementation, the feature extraction instructions 113 include entropy calculation instructions 403. The entropy calculation instructions 403 may be configured to calculate an entropy (e.g., a Shannon entropy) for each of the chunks 402. For example, in FIG. 5, the binary file 310 is used to generate five chunks 402 and the entropy calculation instructions 403 generate data including five of entropy values 404. Entropy values may be calculated using Equation 1: 
    PNG
    media_image2.png
    78
    460
    media_image2.png
    Greyscale
 …In a particular implementation, each byte of each of the chunks 402 is represented by a pair of hexadecimal characters. There are 256 possible values for a pair of hexadecimal characters. Thus, in this implementation, the entropy values (H) range between zero and eight where the maximum entropy (eight) is reached when Pi takes a constant value of 1/256 (i.e., every byte is completely random). In other implementations, other ranges of entropy values may be used depending on the chunking, how data within each chunk is grouped (e.g., into two hexadecimal values in the example above), and the base of the logarithm that is used to calculate the entropy. [col 15, line 55 – col 16, line 11; Examiner is interpreting the chunk of bytes that were divided to be equivalent to a sequence of extracted information having a second variable length. Since the bytes were divided from a binary file (i.e. first variable length), it is implicit that it would be shorter than the first variable length.]);
Tobiyama, Hardy, and Sai are all in the same field of endeavor of malware detection using deep neural networks. Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s and Hardy’s teachings by dividing the log files of Tobiyama/Hardy into portions of bytes and performing a Shannon Entropy operation on the portion of bytes as taught by Sai. One would have been motivated to make this modification in order to use the calculated entropy values from the binary file to detect if the file contains malware. [col 17, lines 45-49, Sai]
Although, Tobiyama discloses using a RNN, the reference doesn’t go into details of processing the embedding with a second RNN, 
Wang teaches processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data (“In the first layer, we use a corresponding RNN to model the temporal movement of each body part based on its concatenated coordinates of joints at each time step. In the second layer, we concatenate the outputs of the RNN of different parts and adopt another RNN to model the movement of the whole body” [pg. 502, left col, ¶2; See also Fig. 3; processing the embedding with a 2nd RNN);
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Hardy’s/Sai’s teachings to process the embedding with a second RNN as taught by Wang. One would have been motivated to make this modification in order to improve the accuracy of the processed source data. [pg. 504, § 5.4, Wang]

Regarding claim 17, Tobiyama/Hardy/Sai/Wang teaches the system of claim 15, where Tobiyama further teaches wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation (“In the convolution layer, features are extracted by convoluting filter to inputs.” [pg. 578. B. Deep Neural Network, ¶3; note: this corresponds to a convolution operation]).

Regarding claim 18, Tobiyama/Hardy/Sai/Wang teaches the system of claim 15, where Tobiyama further teaches wherein the first recurrent neural network includes one or more recurrent neural network layers (“We use RNN with LSTM units for the model. The RNN consists of an input layer x, a normal hidden layer h1, two LSTM layers h2 and h3, and an output layer y.” [pg. 579, § C. Training RNN, ¶1]).

Regarding claim 19, Tobiyama/Hardy/Sai/Wang teaches the system of claim 15, where Hardy further teaches wherein the fully connected neural network includes one or more fully connected layers (“More rigorously, with an SAE deep network with h layers, the first layer takes the input from the training dataset and is trained simply as an AutoEncoder. Then, after the kth hidden layer is obtained, its output is used as the input of the (k + 1)th hidden layer, which is trained similarly. Finally, the hth layer’s output is used as the output of the entire SAE model. In this manner, AutoEncoders can form a hierarchical stack. Figure 3 illustrates a SAEs model with h hidden layers.” [pg. 64, § 4.3 Deep learning architecture with SAEs, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Regarding claim 20, Tobiyama/Hardy/Sai/Wang teaches the system of claim 15, where Tobiyama further teaches wherein the first set of parameters of the first recurrent neural network are adjusted in response to training data (Then we calculate loss function by comparing yt with correct answer xt+1. After input T Operations, weights are updated by backpropagation.” [pg. 579, § C. Training RNN; Updating weights would correspond to adjusting a first set of parameters.]).
However Tobiyama fails to explicitly teach the second set of parameters of the fully connected neural network are adjusted in response to training data.
Hardy teaches the second set of parameters of the fully connected neural network are adjusted in response to training data (“Backpropagate the error through the net and update parameter set θ = {W, b};” [pg. 64, top left column; Backpropagate and updating the parameter teaches a second set of parameters are adjusted]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 21, Tobiyama/Hardy/Sai/Wang teaches the system of claim 9, where Hardy further teaches wherein the fully connected neural network (See Fig. 4, pg. 64, top right column) is further configured to generate a classification of the source data (“Our deep learning framework for malware detection (short for DL4MD) is performed on the analysis of Windows API calls generated from the collected PE files. The system consists of two major components: feature extractor, and deep learning based classifier, as illustrated in Figure 1.” [pg. 62, §3 System architecture, ¶1; The classifier disclosed by Hardy would be performing the classification (i.e. Malware detection) of the source data.).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Regarding claim 22, Tobiyama/Hardy/Sai/Wang teaches the system of claim 21, where Hardy further teaches wherein the classification of the source data is at least one of whether the source data is malicious, adware, or good (“The dataset obtained from Comodo Cloud Security Center contains 50,000 file samples, where 22,500 are malware, 22,500 are benign files, and 5,000 are unknown (with the analysis by the anti-malware experts of Comodo Security Lab, 2,500 of them are labeled as malware and 2,500 of them are benign). In our experiments, those 45,000 file samples are used for training, while the 5,000 unknown files are used for testing. 9,649 Windows API calls are extracted from these 50,000 file samples, so all the file samples can be represented as binary feature vectors with 9,649- dimensions (described in Section 4.1). To quantitatively validate the experimental results, we use the performance measures shown in Table II.” [pg. 65, 5.1 Experimental setup, ¶1]).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 
Regarding claim 23, Tobiyama/Hardy/Sai/Wang teaches the system of claim 15, where Hardy further teaches wherein the fully connected neural network (See Fig. 4, pg. 64, top right column) is further configured to generate a classification of the source data (“Our deep learning framework for malware detection (short for DL4MD) is performed on the analysis of Windows API calls generated from the collected PE files. The system consists of two major components: feature extractor, and deep learning based classifier, as illustrated in Figure 1.” [pg. 62, §3 System architecture, ¶1; The classifier disclosed by Hardy would be performing the classification (i.e. Malware detection) of the source data.).
Tobiyama discloses a malware detection method using trained recurrent neural networks and convolutional neural networks to classify malware processes. Hardy discloses a deep learning architecture using Auto Encoders model for malware detection. Sai discloses training a file classifier from a plurality of binary file inputs to detect malware. Wang teaches a method to model temporal dynamics using two-stream RNNs.  It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Tobiyama’s/Sai’s/Wang’s teachings with Hardy to include a fully connected neural network with a classifier to perform malware detection. One would be motivated to make this modification since deep learning architectures overcome the learning difficulty through layerwise pretraining. [Hardy, § Introduction, ¶3] 

Double Patenting
The nonstatutory double patenting rejection is based on a judicially created doctrine grounded in public policy (a policy reflected in the statute) so as to prevent the unjustified or improper timewise extension of the “right to exclude” granted by a patent and to prevent possible harassment by multiple assignees. A nonstatutory double patenting rejection is appropriate where the conflicting claims are not identical, but at least one examined application claim is not patentably distinct from the reference claim(s) because the examined application claim is either anticipated by, or would have been obvious over, the reference claim(s). See, e.g., In re Berg, 140 F.3d 1428, 46 USPQ2d 1226 (Fed. Cir. 1998); In re Goodman, 11 F.3d 1046, 29 USPQ2d 2010 (Fed. Cir. 1993); In re Longi, 759 F.2d 887, 225 USPQ 645 (Fed. Cir. 1985); In re Van Ornum, 686 F.2d 937, 214 USPQ 761 (CCPA 1982); In re Vogel, 422 F.2d 438, 164 USPQ 619 (CCPA 1970); In re Thorington, 418 F.2d 528, 163 USPQ 644 (CCPA 1969).
A timely filed terminal disclaimer in compliance with 37 CFR 1.321(c) or 1.321(d) may be used to overcome an actual or provisional rejection based on nonstatutory double patenting provided the reference application or patent either is shown to be commonly owned with the examined application, or claims an invention made as a result of activities undertaken within the scope of a joint research agreement. See MPEP § 717.02 for applications subject to examination under the first inventor to file provisions of the AIA  as explained in MPEP § 2159. See MPEP § 2146 et seq. for applications not subject to examination under the first inventor to file provisions of the AIA . A terminal disclaimer must be signed in compliance with 37 CFR 1.321(b). 
The USPTO Internet website contains terminal disclaimer forms which may be used. Please visit www.uspto.gov/patent/patents-forms. The filing date of the application in which the form is filed determines what form (e.g., PTO/SB/25, PTO/SB/26, PTO/AIA /25, or PTO/AIA /26) should be used. A web-based eTerminal Disclaimer may be filled out completely online using web-screens. An eTerminal Disclaimer that meets all requirements is auto-processed and approved immediately upon submission. For more information about eTerminal Disclaimers, refer to www.uspto.gov/patents/process/file/efs/guidance/eTD-info-I.jsp.

Claims 1, 3-6, 8, 9, 11-15, 17-21, and 23 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 5, 7, 15, 18, and 20 of copending Application No. 15909372 (reference application).

Instant Application
App#15909372
Claim 1
Claims 1, 5
A method for embedding variable length source data by a processor, the method comprising: 
A method for generating a classification of variable length source data, the method comprising: 
receiving source data having a first variable length, the source data comprising a sequence of bytes;
receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections;
dividing the sequence of bytes into a plurality of sections;
extracting, information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length; 
extracting feature information associated with each section of the plurality of sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length
processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data, the first recurrent neural network including an input, an output, and a first set of parameters;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: 
providing the sequence of extracted information as input to a convolutional filter of the encoder neural network; and providing an output of convolutional filter as input to a recurrent neural network layer of a first recurrent neural network to generate an embedding of the source data, that represents a transformation of the source data;
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data
providing the embedding of the source data as input to a second recurrent neural network to generate an enhanced embedding of the source data
and processing the enhanced embedding of the source data with a classifier, the classifier comprising a fully connected neural network to generate a classification of the source data, the fully connected neural network including an input, an output, and a second set of parameters
Claim 1: 

wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the enhanced embedding of the source data and a second set of parameters,

the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; 

and processing at least the enhanced embedding of the source data with a classifier to generate a classification of the source data.
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.

All limitations of claim 1 in the instant application are anticipated by claims 1 and 5 of the ‘372 application.

Claim 3

Claim 18
The method of claim 1, wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
The system of claim 15, wherein extracting information associated with each portion of the plurality of portions of bytes of the source data comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
All limitations of claim 3 in the instant application are anticipated by claim 18 of the ‘372 application.
Claim 4
Claim 20
The method of claim 1, wherein the first recurrent neural network includes one or more recurrent neural network layers.
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 4 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 5
Claim 20
The method of claim 1, wherein the fully connected neural network includes one or more fully connected layers.
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 5 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 6
Claim 5
The method of claim 1, wherein the first set of parameters of the first recurrent neural network and the second set of parameters of the fully connected neural network are adjusted in response to training data.

Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
All limitations of claim 6 in the instant application are anticipated by claim 5 of the ‘372 application.
Claim 8

Claim 7
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
The method of claim 1, wherein the classifier is a gradient-boosted tree, ensemble of gradient-boosted trees, random forest, support vector machine, fully connected multilayer perceptron, a partially connected multilayer perceptron, or general linear model.
All limitations of claim 8 in the instant application are anticipated by claim 7 of the ‘372 application.
Claim 9
Claims 1, 5
A system for embedding variable length source data by a processor, the system comprising: 

one or more processors; 

and at least one non-transitory computer readable storage medium having instructions therein, which, when executed by the one or more processors, cause the one or more processors to perform actions comprising:
Claim 1:

A method for generating a classification of variable length source data, the method comprising:
receiving source data having a first variable length, the source data comprising a sequence of bytes
receiving source data having a first variable length, the source data comprising a sequence of bytes
dividing the sequence of bytes into a plurality of sections;
dividing the sequence of bytes into a plurality of portions of bytes;
extracting, information associated with each section of the sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length;
and processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data, the first recurrent neural network including an input, an output, and a first set of parameters;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: 
providing the sequence of extracted information as input to a convolutional filter of the encoder neural network; and providing an output of convolutional filter as input to a recurrent neural network layer of a first recurrent neural network to generate an embedding of the source data, that represents a transformation of the source data;
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data
providing the embedding of the source data as input to a second recurrent neural network to generate an enhanced embedding of the source data
and processing the enhanced embedding of the source data with a fully connected neural network, the fully connected neural network including an input, an output, and a second set of parameters
Claim 1: 

wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the enhanced embedding of the source data and a second set of parameters,

the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; 
and processing at least the embedding of the source data with a classifier to generate a classification of the source data.
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on the machine learning algorithm.
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.

All limitations of claim 9 in the instant application are anticipated by claims 1 and 5 of the ‘372 application.
Claim 11
Claim 18
The system of claim 9, wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
The system of claim 15, wherein extracting information associated with each portion of the plurality of portions of bytes of the source data comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
All limitations of claim 11 in the instant application are anticipated by claim 18 of the ‘372 application.
Claim 12
Claim 20
The system of claim 9, wherein the first recurrent neural network includes one or more recurrent neural network layers.
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 12 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 13
Claim 20
The system of claim 9, wherein the fully connected neural network includes one or more fully connected layers.
The system of claim 15, wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 13 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 14
Claim 5
The system of claim 9, wherein the first set of parameters of the first recurrent neural network and the second set of parameters of the fully connected neural network are adjusted in response to training data.
The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
All limitations of claim 14 in the instant application are anticipated by claim 5 of the ‘372 application.
Claim 15
Claims 1, 5
A system for embedding source data by a processor, the source data comprising a sequence of bytes and having a first variable length, the system comprising:

one or more processors; and 
a memory having instructions stored therein, which, when executed by the one or more processors, cause the one or more processor to perform actions comprising;
A method for generating a classification of variable length source data, the method comprising:

receiving source data having a first variable length, the source data comprising a sequence of bytes;
dividing the sequence of bytes into a plurality of sections;
dividing the sequence of bytes of into a plurality of portions of bytes;
extracting, information associated with each section of the sections to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length and shorter than the first variable length;
extracting information associated with each portion of the plurality of portions of bytes of the source data to generate a sequence of extracted information having a second variable length, the second variable length based on the first variable length, wherein extracting information generates one or more intermediate sequences;
and processing the sequence of extracted information with a first recurrent neural network to generate an embedding of the source data, the first recurrent neural network including an input, an output, and a first set of parameters;
processing the sequence of extracted information with an encoder neural network that includes a first set of parameters, the processing comprising: 
providing the sequence of extracted information as input to a convolutional filter; and providing an output of convolutional filter as input to a recurrent neural network layer to generate an embedding of the source data, that represents a transformation of the source data;
processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data
providing the embedding of the source data as input to a second recurrent neural network to generate an enhanced embedding of the source data
and processing the enhanced embedding of the source data with a fully connected neural network, the fully connected neural network including an input, an output, and a second set of parameters
Claim 1: 

wherein the encoder neural network is configured by training the encoder neural network with a decoder neural network, the decoder neural network including an input for receiving the enhanced embedding of the source data and a second set of parameters,

the decoder neural network generating an output that approximates at least one of (a) the sequence of extracted information, (b) a category associated with the source data, or (c) the source data; 
and processing at least the embedding of the source data with a classifier to generate a classification of the source data.
wherein the first recurrent neural network is configured by adjusting the first set of parameters of the first recurrent neural network based, at least in part, on a machine learning algorithm.
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
and wherein the fully connected neural network is configured by adjusting the second set of parameters of the fully connected neural network based, at least in part, on a machine learning algorithm.
Claim 5:

The method of claim 1, wherein the decoder neural network is configured by (i) receiving the enhanced embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.

All limitations of claim 15 in the instant application are anticipated by claims 1 and 5 of the ‘372 application.
Claim 17
Claim 18
The system of claim 15, wherein extracting information further comprises executing at least one of a Shannon entropy, a convolution operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
The system of claim 15, wherein extracting information associated with each portion of the plurality of portions of bytes of the source data comprises executing at least one of a convolution operation, a Shannon Entropy operation, a statistical operation, a wavelet transformation operation, a Fourier transformation operation, a compression operation, a disassembling operation, or a tokenization operation.
All limitations of claim 17 in the instant application are anticipated by claim 18 of the ‘372 application.
Claim 18
Claim 20
The system of claim 15, wherein the first recurrent neural network includes one or more recurrent neural network layers.
wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 18 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 19
Claim 20
The system of claim 16, wherein the fully connected neural network includes one or more fully connected layers.
wherein the decoder neural network includes at least one of one or more recurrent neural network layers or one or more fully connected layers.
All limitations of claim 19 in the instant application are anticipated by claim 20 of the ‘372 application.
Claim 20
Claim 5
The system of claim 16, wherein the first set of parameters of the first recurrent neural network and the second set of parameters of the fully connected neural network are adjusted in response to training data.
The method of claim 1, wherein the decoder neural network is configured by (i) receiving the embedding of the source data, (ii) adjusting, using machine learning, the first set of parameters and second set of parameters, and (iii) repeating (i) and (ii) until the output of the decoder neural network approximates to within a threshold of at least one of (a) the sequence of extracted information, (b) a category associated with the source data, (c) the source data, or (d) combinations thereof.
All limitations of claim 20 in the instant application are anticipated by claim 5 of the ‘372 application.
Claim 21
Claim 1
The system of claim 9, wherein the fully connected neural network is further configured to generate a classification of the source data.
processing at least the enhanced embedding of the source data with a classifier to generate a classification of the source data.
All limitations of claim 21 in the instant application are anticipated by claim 1 of the ‘372 application.
Claim 23
Claim 1
The system of claim 15, wherein the fully connected neural network is further configured to generate a classification of the source data.
processing at least the enhanced embedding of the source data with a classifier to generate a classification of the source data.
All limitations of claim 23 in the instant application are anticipated by claim 1 of the ‘372 application.


This is a provisional nonstatutory double patenting rejection.





Response to Arguments
Applicant's arguments filed 05/16/2022 have been fully considered but they are not persuasive. 

Regarding the double patenting rejection, the double patenting rejection will be held in abeyance and reconsidered at the time allowable claims are identified. Thus, claims 1, 3-6, 8, 9, 11-15, 17-21, and 23 provisionally rejected on the ground of nonstatutory double patenting as being unpatentable over claim 1, 5, 7, 15, 18, and 20 of copending Application No. 15909372.


Regarding the 35 U.S.C. 103 Rejection:

Applicant’s arguments on pgs. 11-13 regarding the cited prior arts failing to teach the newly amended limitation of “processing the embedding with a second recurrent neural network to generate an enhanced embedding of the source data” has been considered but are moot because the newly amended limitation is now taught by the newly presented art of Wang. Please see the updated 103 rejection above. 

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.
Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.










/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122