DETAILED ACTION
Claims 1-20 are pending in the current application

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

	EXAMINER’S AMENDMENT
1. An examiner’s amendment to the record appears below. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee. 

2. Authorization for this examiner’s amendment was given in an interview with Greg Melnick on 12/8/21.



In the Claims

1.	(Currently amended) A computer-implemented method for securing software installation through deep graph learning, comprising:
, wherein extracting the new SIG further includes running a backtracking process by:
iteratively backtracking through all past system events to find relevant system events;
constructing at least one preprocessed SIG describing at least a partial software installation instance by converting and mapping each system event into a component of the at least one SIG; and
performing postprocessing on the at least one preprocessed SIG to generate the at least one SIG, including storing the at least one preprocessed SIG in a graph database and merging all relevant graphs that share common nodes stored in the graph database to generate at least one merged graph corresponding to the new SIG;
using at least two node embedding models to generate a first vector representation by embedding the nodes of the new SIG and inferring any embeddings for out-of-vocabulary (OOV) words corresponding to unseen pathnames; 
utilizing a deep graph autoencoder including a graph long short-term memory (LSTM) as an encoder and a multilayer perceptron (MLP) as a decoder to reconstruct nodes of the new SIG from latent vector representations encoded by the graph LSTM, wherein reconstruction losses resulting from a difference of a second vector representation generated by the deep graph autoencoder and the first vector representation represent anomaly scores for each node; and


2.	(Cancelled) 

3.	(Currently amended) The method as recited in claim [[2]]1, further comprising:
identifying a starting point of the backtracking by finding events of a process writing a binary file or renaming a file into a binary file; and
terminating the backtracking based on a set of stop criteria, wherein the postprocessing is performed in response to the termination.

4.  	(Original) The method as recited in claim 1, further comprising utilizing a model training and validation process, including:
extracting at least one training SIG corresponding to at least one software installation based on training data to obtain a complete set of graphs;
dividing the complete set of graphs into a training set and a validation set;
learning the node embedding models from the training set using random walks and pathname components, including randomly sampling individual paths to a configurable length from the at least one training SIG to generate training data including the individual paths;
training the deep graph autoencoder to reconstruct normal process nodes and minimize reconstruction losses between the encoder and the decoder; and
using validation data of the validation set to verify model performance and determine the threshold of normal software installation using the reconstruction losses.

5.	(Original) The method as recited in claim 4, wherein training the deep graph autoencoder further includes vectorizing the at least one training SIG based on the node embedding models to generate at least one vectorized SIG.

6.	(Original) The method as recited in claim 4, wherein training the deep graph autoencoder further includes: 
feeding the at least one training SIG into the encoder by topological order of edges to generate an output including a latent vector representation on each process node of the at least one training SIG; and
transferring the output to the decoder to reconstruct an original vector representation on each process node.

7.	(Original) The method as recited in claim 1, wherein performing the anomaly detection further includes:
determining that the overall anomaly score exceeds the threshold of normal software installation;
classifying the new software installation as abnormal in response to determining that the overall anomaly score exceeds the threshold of normal software installation;  and
generating results of the software installation detection including a list of most suspicious processes of the new software installation sorted by respective anomaly scores.



9.  	(Currently amended) A computer program product comprising a non-transitory computer readable storage medium having program instructions embodied therewith, the program instructions executable by a computer to cause the computer to perform a method for securing software installation through deep graph learning, the method performed by the computer comprising:
extracting a new software installation graph (SIG) corresponding to a new software installation based on installation data associated with the new software installation, the new SIG having nodes representing system subjects and objects and edges recording interactions between the system subjects and objects, wherein extracting the new SIG further includes running a backtracking process by:
iteratively backtracking through all past system events to find relevant system events;
constructing at least one preprocessed SIG describing at least a partial software installation instance by converting and mapping each system event into a component of the at least one SIG; and
performing postprocessing on the at least one preprocessed SIG to generate the at least one SIG, including storing the at least one preprocessed SIG in a graph database and merging all relevant graphs that share common nodes stored in the graph database to generate at least one merged graph corresponding to the new SIG;
using at least two node embedding models to generate a first vector representation by embedding the nodes of the new SIG and inferring any embeddings for out-of-vocabulary (OOV) words corresponding to unseen pathnames; 
utilizing a deep graph autoencoder including a graph long short-term memory (LSTM) as an encoder and a multilayer perceptron (MLP) as a decoder to reconstruct nodes of the new SIG from latent vector representations encoded by the graph LSTM, wherein reconstruction losses resulting from a difference of a second vector representation generated by the deep graph autoencoder and the first vector representation represent anomaly scores for each node; and
performing anomaly detection by comparing an overall anomaly score of the anomaly scores to a threshold of normal software installation.

10.	(Cancelled) 

11.  	(Original) The computer program product as recited in claim 9, wherein the method further includes utilizing a model training and validation process, including:
extracting at least one training SIG corresponding to at least one software installation based on training data to obtain a complete set of graphs;
dividing the complete set of graphs into a training set and a validation set;
learning the node embedding models from the training set using random walks and pathname components, including randomly sampling individual paths to a configurable length from the at least one training SIG to generate training data including the individual paths;

using validation data of the validation set to verify model performance and determine the threshold of normal software installation using the reconstruction losses.

12.	(Original) The computer program product as recited in claim 11, wherein training the deep graph autoencoder further includes vectorizing the at least one training SIG based on the node embedding models to generate at least one vectorized SIG.

13.	(Original) The computer program product as recited in claim 11, wherein training the deep graph autoencoder further includes: 
feeding the at least one training SIG into the encoder by topological order of edges to generate an output including a latent vector representation on each process node of the at least one training SIG; and
transferring the output to the decoder to reconstruct an original vector representation on each process node.

14.	(Currently amended) The computer program product as recited in claim [[8]]9, wherein performing the anomaly detection further includes:
determining that the overall anomaly score exceeds the threshold of normal software installation;
classifying the new software installation as abnormal in response to determining that the overall anomaly score exceeds the threshold of normal software installation;  and

selecting the node embedding model and the deep graph model from the installation behavior models database based on installed files associated with the new software installation.

15.	(Currently amended) A system for securing software installation through deep graph learning, comprising:
	a memory device storing program code; and
at least one processor device operatively coupled to the memory device and configured to execute program code stored on the memory device to:
extract a new software installation graph (SIG) corresponding to a new software installation based on installation data associated with the new software installation, the new SIG having nodes representing system subjects and objects and edges recording interactions between the system subjects and objects, wherein the at least one processor device is configured to extract the new SIG further by running a backtracking process by:
iteratively backtracking through all past system events to find relevant system events;
constructing at least one preprocessed SIG describing at least a partial software installation instance by converting and mapping each system event into a component of the at least one SIG; and
performing postprocessing on the at least one preprocessed SIG to generate the at least one SIG, including storing the at least one preprocessed SIG in a graph database and merging all relevant graphs that share common nodes stored in the graph database to generate at least one merged graph corresponding to the new SIG;
use at least two node embedding models to generate a first vector representation by embedding the nodes of the new SIG and inferring any embeddings for out-of-vocabulary (OOV) words corresponding to unseen pathnames; 
utilize a deep graph autoencoder including a graph long short-term memory (LSTM) as an encoder and a multilayer perceptron (MLP) as a decoder to reconstruct nodes of the new SIG from latent vector representations encoded by the graph LSTM, wherein reconstruction losses resulting from a difference of a second vector representation generated by the deep graph autoencoder and the first vector representation represent anomaly scores for each node; and
perform anomaly detection by comparing an overall anomaly score of the anomaly scores to a threshold of normal software installation.
16.	(Cancelled) 

17.  	(Original) The system as recited in claim 15, wherein the at least one processor device is further configured to execute program code stored on the memory device to utilize a model training and validation process by:
extracting at least one training SIG corresponding to at least one software installation based on training data to obtain a complete set of graphs;
dividing the complete set of graphs into a training set and a validation set;

training the deep graph autoencoder to reconstruct normal process nodes and minimize reconstruction losses between the encoder and the decoder; and
using validation data of the validation set to verify model performance and determine the threshold of normal software installation using the reconstruction losses.

18.	(Original) The system as recited in claim 17, wherein the at least one processor device is further configured to train the deep graph autoencoder by vectorizing the at least one training SIG based on the node embedding models to generate at least one vectorized SIG.

19.	(Original) The system as recited in claim 17, wherein the at least one processor device is further configured to train the deep graph autoencoder by: 
feeding the at least one training SIG into the encoder by topological order of edges to generate an output including a latent vector representation on each process node of the at least one training SIG; and
transferring the output to the decoder to reconstruct an original vector representation on each process node.

20.	(Original) The system as recited in claim 15, wherein the at least one processor device is further configured to perform the anomaly detection by:

classifying the new software installation as abnormal in response to determining that the overall anomaly score exceeds the threshold of normal software installation; and
generating results of the software installation detection including a list of most suspicious processes of the new software installation sorted by respective anomaly scores; and
selecting the node embedding model and the deep graph model from the installation behavior models database based on installed files associated with the new software installation.

Reasons for Allowance

Claims 1, 3-9, 11-15 and 17-20 are allowed.
An examiner’s amendment to the record appears as above. Should the changes and/or additions be unacceptable to applicant, an amendment may be filed as provided by 37 CFR 1.312. To ensure consideration of such an amendment, it MUST be submitted no later than the payment of the issue fee.
The following is an examiner’s statement of reasons for allowance:  The currently amended subject matter of the claims seem in the amendment claims shows an increase in detail about the backtracking process in how it iteratively move through each process to construct a preprocessed software installation graph where each system even is converted and mapped into a graph component and being able through post processing to generate of SIG and merging all relevant graphs that share common nodes stored in the graph database ,and is viewed as unique and different from any prior art found.





Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to JAE UK JEON whose telephone number is (571)270-3649.  The examiner can normally be reached on 9am-6pm. Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice. If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Chat Do can be reached on 571-272-3721.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300. Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-




/BRADFORD F WHEATON/Examiner, Art Unit 2193                                                                                                                                                                                                        


/JAE U JEON/Primary Examiner, Art Unit 2193