DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
This communication is in response to the RCE filed on 10/06/2022. Claims 1-20 are currently pending.

Response to Amendment
3.	The amendment filed on 08/24/2022 has been entered and claims 1-20 remain pending the application.

Response to Arguments
4.	Applicant’s arguments filed on 07/28/2022 have been fully considered but are moot in view of new rejection made below in response to applicant’s arguments,

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-4, 7-11 and 14-18, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPub. No 20180293381 to Tseng et al. (hereinafter Tseng) in view of U.S. PGPub. No 20170180418 to SHEN et al. (hereinafter SHEN) and further in view of US PAT. No. 9864956 to Sai; Na (hereinafter Sai).

Regarding claim 1, referring to figures 1 and 5, Tseng discloses a non-transitory computer-readable storage medium having computer-readable code stored thereon for programming one or more processors to perform steps of: 
obtaining file identifiers (¶0058, “a data collection time, a file type, and a hash associated with the file (e.g., MD5 and SHA256”, and ¶0003 “Hash for the entire file”) associated with files in production data (¶0030, data that is transmitted or received via the network 140), that includes both benign and malicious data (¶0033, “The data for modeling is selected from a set of benign files and a set of malicious files…”);
obtaining lab data (¶0058 data from malicious sources) from one or more public repositories of malware samples (¶0058, virustotal.com, malwr.com, and malware blogs -Fig. 5) based on the file identifiers (¶0003, “Hash for the entire file”) obtained from the production data; (¶0030, FIG. 1, data that is transmitted or received via the network 140)
 	and utilizing the obtained lab data as training data for training a machine learning process (¶0060, wherein the selection module provides data to a model trainer 120 and wherein model trainer 120 comprises model training module 160 ¶0031 which uses supervised machine learning to train a machine learning model ¶0034) for classifying malware in the production data (¶0034, wherein the model training module determines malicious packets of a file).
However, Tseng does not explicitly disclose the following underlined portion of the limitation taught by SHEN: wherein the production data is live, real traffic that includes both benign and malicious data, 
file identifiers obtained from the live, real production data.
 	
SHEN discloses a data collection module for collecting routing data in real-time to be analyzed for malicious BGP events (¶0029 “The Hijack Detection System 120 may include a data collection module 122 for collecting routing data in real-time to be analyzed for malicious BGP events. The may also Hijack Detection System 120 include a local data store 124, which can be one or more centralized data repositories that store current routing data, at least one set of historically confirmed BGP hijacking data (positive samples), negative samples, and the like” wherein positive samples represent malicious samples and negative samples represent benign ones according to SHEN ¶0002), and wherein the data is live since it is in real-time and  live data is relating to real-world data or program working with it as opposed to test data. 
	SHEN also discloses file identifier obtained from the live, real production data (¶0039 “…The BGP Hijack Detection Module 130 may further detect blocks of IP addresses associated with at least one malicious event”, wherein the IP addresses associated with at least one malicious event is file identifier from the live, real production data. IP address is one of the file identifiers in accordance with applicant’s disclosure in ¶0030).
Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the applicant’s claimed invention to modify the non-transitory computer readable medium of Tseng to include the teaching of obtaining or collecting production data in real-time as disclosed by SHEN and be motivated in doing so because it enables the system to continuously and accurately detecting malicious hijack events -SHEN abstract.
	Although Tseng discloses in the background of the invention (¶0003) the computation of a hash of the entire file receives by the malware detection engine and compares the computed hash to the millions of stored hashes maintained in the database, does not explicitly in view of SHEN discloses that the hash of the data/file is used to train a machine leaning model for the purpose of classification of data as malicious or benign.
However, Sai discloses in col. 7, lines 63-67 through col. 8, lines 1-24 a file identifier that includes a hash value of the feature vector of a file, storing the identifier and data classification in a system memory as whitelist or black list, generate a hash value of the received new file and then compares the hash value of the new file to the file identifiers stored in the memory for the purpose of training a file classifier.  White listed items are equivalent to the claimed benign data while black listed items are equivalent to the claimed malicious data.
Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of applicant’s claimed invention to modify the computer readable medium of Tseng and SHEN to include using a hash (file identifier) of training data or lab data to train a file classifier instead of using the entire file/data as disclosed by Sai and be motivated in doing so in order to preserve the privacy of the entire data and user.




Regarding claim 2, Tseng in view of SHEN and further in view of Sai discloses the non-transitory computer-readable storage medium of claim 1. Tseng further discloses wherein the obtaining file identifiers is based on monitoring of users associated with the files (¶0026, wherein recommendation is provided to customers (users) thus monitoring of the user occurs) and wherein only the file identifiers are maintained based on the monitor (¶0024). 

Regarding claim 3, Tseng in view of SHEN and further in view of Sai discloses the non-transitory computer-readable storage medium of claim 1. Tseng further discloses wherein the obtaining lab data includes samples from the one or more public repositories (¶0060), matching the corresponding file identifiers for the production data (¶0003 “If a match is found, between a hash of the complete file and one of the millions of stored hashes of known malware”) 

Regarding claim 4, Tseng in view of SHEN and further in view of Sai discloses the non-transitory computer-readable storage medium of claim 1. Tseng further discloses wherein the obtaining lab data includes selecting samples from the one or more public repositories (¶0058, FIG. 5, “a benign source 510 and a malicious source 520”) that have features closely related to features of the production data (¶0057/¶0058 similar file types). 

Regarding claim 7, Tseng in view of SHEN and further in view of Sai discloses the non-transitory computer-readable storage medium of claim 1. Tseng further discloses wherein the files are executable files (¶0004 wherein the file can be executed in a malware sandbox).  

Regarding claim 8, Tseng discloses a server comprising: one or more processors and memory storing computer-executable instructions that, when executed, cause the one or more processors to 
obtain file identifiers (¶0058, “a data collection time, a file type, and a hash associated with the file (e.g., MD5 and SHA256”, and ¶0003 “Hash for the entire file”) associated with files in production data (¶0030, FIG. 1, data that is transmitted or received via the network 140), that includes both benign and malicious data (¶0033, “The data for modeling is selected from a set of benign files and a set of malicious files…”); 
obtain lab data (¶0058 data from malicious sources) from one or more public repositories of malware samples (¶0058, virustotal.com, malwr.com, and malware blogs -Fig. 5) based on the file identifiers (¶0003, “Hash for the entire file”) obtained the production data; (¶0030, data that is transmitted or received via the network 140)
and utilize the obtained lab data as training data for training a machine learning process (¶0060, wherein the selection module provides data to a model trainer 120 and wherein model trainer 120 comprises model training module 160 ¶0031) which uses supervised machine learning to train a machine learning model ¶0034) for classifying malware in the production data (¶0034 wherein the model training module determines malicious packets of a file).
However, Tseng does not explicitly disclose the following underlined portion of the limitation taught by SEONG: wherein the production data is live, real traffic that includes both benign and malicious data. 
file identifiers obtained from the live, real production data. 
 	

 	SHEN discloses a data collection module for collecting routing data in real-time to be analyzed for malicious BGP events (¶0029 “The Hijack Detection System 120 may include a data collection module 122 for collecting routing data in real-time to be analyzed for malicious BGP events. The may also Hijack Detection System 120 include a local data store 124, which can be one or more centralized data repositories that store current routing data, at least one set of historically confirmed BGP hijacking data (positive samples), negative samples, and the like” wherein positive samples represent malicious samples and negative samples represent benign ones according to SHEN ¶0002), and wherein the data is live since it is in real-time and live data is relating to real-world data or program working with it as opposed to test data. 
SHEN also discloses file identifiers obtained from the live, real production data (¶0039 “…The BGP Hijack Detection Module 130 may further detect blocks of IP addresses associated with at least one malicious event”, wherein the IP addresses associated with at least one malicious event is file identifiers from the live, real production data. IP address is one of the file identifiers in accordance with applicant’s disclosure in ¶0030).

	Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the applicant’s claimed invention to modify the server of Tseng to include the teaching of obtaining or collecting production data in real-time as disclosed by SHEN and be motivated in doing so because it enables the system to continuously and accurately detecting malicious hijack events -SHEN abstract.
	Although Tseng discloses in the background of the invention (¶0003) the computation of a hash of the entire file receives by the malware detection engine and compares the computed hash to the millions of stored hashes maintained in the database, does not explicitly in view of SHEN discloses that the hash of the data/file is used to train a machine leaning model for the purpose of classification of data as malicious or benign.
However, Sai discloses in col. 7, lines 63-67 through col. 8, lines 1-24 a file identifier that includes a hash value of the feature vector of a file, storing the identifier and data classification in a system memory as whitelist or black list, generate a hash value of the received new file and then compares the hash value of the new file to the file identifiers stored in the memory for the purpose of training a file classifier. White listed items are equivalent to the claimed benign data while black listed items are equivalent to the claimed malicious data.

Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of applicant’s claimed invention to modify the sever of Tseng and SHEN to include using a hash (file identifier) of training data or lab data to train a file classifier instead of using the entire file/data as disclosed by Sai and be motivated in doing so in order to preserve the privacy of the entire data and user. 

Regarding claim 9, Tseng in view of SHEN and further in view of Sai discloses the server of claim 8. Tseng further discloses wherein the file identifiers are obtained based on monitoring of users associated with the files (¶0026, wherein recommendation is provided to customers (users) thus monitoring of the user occurs) and wherein only the file identifiers are maintained based on the monitor (¶0024).

Regarding claim 10, Tseng in view of SHEN and further in view of Sai discloses the server of claim 8. Tseng further discloses wherein the lab data includes samples from the one or more public repositories (¶0060) matching the corresponding file identifiers for the production data (¶0003 “If a match is found, between a hash of the complete file and one of the millions of stored hashes of known malware”). 

Regarding claim 11, Tseng in view of SHEN and further in view of Sai discloses the server of claim 8. Tseng further discloses wherein the lab data includes samples from the one or more public repositories (¶0058, FIG. 5 “a benign source 510 and a malicious source 520”) that have features closely related to features of the production data (¶0057/¶0058 similar file types).   

Regarding claim 14, Tseng in view of SHEN and further in view of Sai discloses the server of claim 8. Tseng further discloses wherein the files are executable files (¶0004 wherein the file can be executed in a malware sandbox).  

Regarding claim 15, Tseng discloses a method comprising: 
obtaining file identifiers (¶0058, “a data collection time, a file type, and a hash associated with the file (e.g., MD5 and SHA256”, and ¶0003 “Hash for the entire file”) associated with files in production data (¶0030, data that is transmitted or received via the network 140), that includes both benign and malicious data (¶0033, “The data for modeling is selected from a set of benign files and a set of malicious files…”);
obtaining lab data (¶0058 data from malicious sources) from one or more public repositories of malware samples (¶0058 virustotal.com, malwr.com, and malware blogs -Fig. 5) based on the file identifiers (¶0003, “Hash for the entire file”) obtained from the production data; (¶0030, FIG. 1, data that is transmitted or received via the network 140);
and utilizing the obtained lab data as training data for training a machine learning process (¶0060, wherein the selection module provides data to a model trainer 120 and wherein model trainer 120 comprises model training module 160 ¶0031) which uses supervised machine learning to train a machine learning model ¶0034) for classifying malware in the production data (¶0034 wherein the model training module determines malicious packets of a file).

	However, Tseng does not explicitly disclose the following underlined portion of the limitation taught by SHEN: wherein the production data is live, real traffic that includes both benign and malicious data. 
file identifiers obtained from the live, real production data.  
 	
 	SHEN discloses a data collection module for collecting routing data in real-time to be analyzed for malicious BGP events (¶0029 “The Hijack Detection System 120 may include a data collection module 122 for collecting routing data in real-time to be analyzed for malicious BGP events. The may also Hijack Detection System 120 include a local data store 124, which can be one or more centralized data repositories that store current routing data, at least one set of historically confirmed BGP hijacking data (positive samples), negative samples, and the like” wherein positive samples represent malicious samples and negative samples represent benign ones according to SHEN ¶0002), and wherein the data is live since it is in real-time and  live data is relating to real-world data or program working with it as opposed to test data. 
SHEN also discloses file identifiers obtained from the live, real production data (¶0039 “…The BGP Hijack Detection Module 130 may further detect blocks of IP addresses associated with at least one malicious event”, wherein the IP addresses associated with at least one malicious event is file identifiers from the live, real production data. IP address is one of the file identifiers in accordance with applicant’s disclosure in ¶0030).
	Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the applicant’s claimed invention to modify the method of Tseng to include the teaching of obtaining or collecting production data in real-time as disclosed by SHEN and be motivated in doing so because it enables the system to continuously and accurately detecting malicious hijack events -SHEN abstract.
	
Although Tseng discloses in the background of the invention (¶0003) the computation of a hash of the entire file receives by the malware detection engine and compares the computed hash to the millions of stored hashes maintained in the database, does not explicitly in view of SHEN discloses that the hash of the data/file is used to train a machine leaning model for the purpose of classification of data as malicious or benign.
However, Sai discloses in col. 7, lines 63-67 through col. 8, lines 1-24 a file identifier that includes a hash value of the feature vector of a file, storing the identifier and data classification in a system memory as whitelist or black list, generate a hash value of the received new file and then compares the hash value of the new file to the file identifiers stored in the memory for the purpose of training a file classifier. White listed items are equivalent to the claimed benign data while black listed items are equivalent to the claimed malicious data.

Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the method of Tseng and SHEN to include using a hash (file identifier) of training data or lab data to train a file classifier instead of using the entire file/data as disclosed by Sai and be motivated in doing so in order to preserve the privacy of the entire data and user. 

Regarding claim 16, Tseng in view of SHEN and further in view of Sai discloses the method of claim 15. Tseng further discloses wherein the obtaining file identifiers is based on monitoring of users associated with the files, and wherein only the file identifiers are maintained based on the monitoring (¶0026 wherein recommendation is provided to customers (users) thus monitoring of the user occurs) and wherein only the file identifiers are maintained based on the monitor (¶0024). 

Regarding claim 17, Tseng in view of SHEN and further in view of Sai discloses the method of claim 15. Tseng further discloses wherein the obtaining lab data includes samples from the one or more public repositories (¶0060) matching the corresponding file identifiers for the production data (¶0003 “If a match is found, between a hash of the complete file and one of the millions of stored hashes of known malware”).

 	Regarding claim 18, Tseng in view of SHEN and further in view of Sai discloses the method of claim 15. Tseng further discloses wherein the obtaining lab data includes selecting samples from the one or more public repositories (¶0058, FIG. 5, “a benign source 510 and a malicious source 520”) that have features closely related to features of the production data (¶0057/¶0058 similar file types).  

Regarding claim 20, Tseng in view of SHEN and further in view of Sai discloses the method of claim 15. Tseng further discloses wherein the files are executable files (¶0004 wherein the file can be executed in a malware sandbox). 

Claims 5, 12 and 19, are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPub. No 20180293381 to Tseng et al. (hereinafter Tseng) in view of U.S. PGPub. No 20170180418 to SHEN et al. (hereinafter SHEN) and further in view of US PAT. No. 9864956 to Sai; Na (hereinafter Sai) and further in view of U.S. PGPub. No 20180308237 to SEONG et al. (hereinafter SEONG).

Regarding claim 5, Tseng in view of SHEN and further in view of Sai discloses the non-transitory computer-readable storage medium of claim 4. Tseng further discloses the concept of features extraction technique which could be used for dimension reduction in ¶0042, but does not explicitly disclose the limitation “wherein the features of the production data are determined based on dimension reduction, and wherein the corresponding samples are selected based on a distance to the production data) as taught by SEONG in ¶0122. 
See SEONG disclosure in (¶0122 “The dimension reduction module 24-3 according to an exemplary embodiment may reduce a dimension by selecting a meaningful feature among features extracted by the feature extraction module 23. For example, in the exemplary embodiments, since the point or area designated by the user has moved to the center of the image, the meaningful feature may be features extracted from the center part of the image”).
	Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the non-transitory computer readable medium of Tseng, SHEN, and Sai in claim 4 by incorporating the teaching of dimension reduction using a feature extraction module as disclosed by SEONG and be motivated in doing so because it enhances the classification of the features selected by the dimension reduction module into classes-SEONG ¶0124.

Regarding claim 12, Tseng in view of SHEN and further in view of Sai discloses the server of claim 11. Tseng further discloses the concept of features extraction technique which could be used for dimension reduction in ¶0042, but does not explicitly disclose the following limitation taught by SEONG: wherein the features of the production data are determined based on dimension reduction, and wherein the corresponding samples are selected based on a distance to the production data.
See SEONG disclosure in ¶0122 “The dimension reduction module 24-3 according to an exemplary embodiment may reduce a dimension by selecting a meaningful feature among features extracted by the feature extraction module 23. For example, in the exemplary embodiments, since the point or area designated by the user has moved to the center of the image, the meaningful feature may be features extracted from the center part of the image”.
	Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the server of Tseng, SHEN, and Sai in claim 11 by incorporating the teaching of dimension reduction using a feature extraction module as disclosed by SEONG and be motivated in doing so because it enhances the classification of the features selected by the dimension reduction module into classes-SEONG ¶0124. 

Regarding claim 19, Tseng in view of SHEN and further in view of Sai discloses the method of claim 18. Tseng further discloses the concept of features extraction technique which could be used for dimension reduction in ¶0042, but does not explicitly disclose the limitation:
“wherein the features of the production data are determined based on dimension reduction, and wherein the corresponding samples are selected based on a distance to the production data as taught by SEONG.
See SEONG disclosure in ¶0122 “The dimension reduction module 24-3 according to an exemplary embodiment may reduce a dimension by selecting a meaningful feature among features extracted by the feature extraction module 23. For example, in the exemplary embodiments, since the point or area designated by the user has moved to the center of the image, the meaningful feature may be features extracted from the center part of the image”.
	Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the method of Tseng, SHEN, and Sai in claim 18 by incorporating the teaching of dimension reduction using a feature extraction module as disclosed by SEONG and be motivated in doing so because it enhances the classification of the features selected by the dimension reduction module into classes-SEONG ¶0124. 


Claims 6, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over U.S. PGPub. No 20180293381 to Tseng et al. (hereinafter Tseng) in view of U.S. PGPub. No 20170180418 to SHEN et al. (hereinafter SHEN) and further in view of US PAT. No. 9864956 to Sai; Na (hereinafter Sai) and further in view of U.S. PGPub. No 20180308237 to SEONG et al. (hereinafter SEONG) and further in view of U.S. PGPub. No 20200327600 to Yilmazcoban et al. (hereinafter Yilmazcoban).

Regarding claim 6, Tseng in view of SHEN and further in view of Sai and further in view of SEONG discloses the non-transitory computer-readable storage medium of claim 5. 
However, they do not explicitly disclose the following limitation taught by Yilmazcoban: wherein the dimension reduction includes parametric t-distributed Stochastic Neighbor Embedding (tSNE) and an autoencoder to learn representations of the production data.  
Yilmazcoban discloses the concept of using TSNE, autoencoder and PCA for dimension reduction after feature extraction in ¶0023 “different methods such as, but not limited to, Principal Component Analysis (PCA), autoencoders, and T-distributed Stochastic Neighbor Embedding (TSNE) are used for feature dimension reduction”.
Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the non-transitory computer readable medium of Tseng, SHEN, Sai, and SEONG in claim 5 by incorporating the teaching of using TSNE and autoencoder for dimension reduction as disclosed by Yilmazcoban and be motivated in doing so because it enhances provide a utilization method that enhances the neural network model to learn information related to the user based on information collected from online social networking platforms and the sweep learning structure in order to provide product recommendation to the user- Yilmazcoban abstract. 

Regarding claim 13, Tseng in view of SHEN and further in view of Sai and further in view of SEONG discloses the server of claim 12. However, they do not explicitly disclose the following limitation taught by Yilmazcoban: wherein the dimension reduction includes parametric t-distributed Stochastic Neighbor Embedding (tSNE) and an autoencoder to learn representations of the production data.   
Yilmazcoban discloses the concept of using TSNE, autoencoder and PCA for dimension reduction after feature extraction in ¶0023 “different methods such as, but not limited to, Principal Component Analysis (PCA), autoencoders, and T-distributed Stochastic Neighbor Embedding (TSNE) are used for feature dimension reduction”.
Thus, one of ordinary skill in the art would have found it obvious before the effective filing date of the claimed invention to modify the server of Tseng, SHEN, Sai, and SEONG in claim 12 by incorporating the teaching of using TSNE and autoencoder for dimension reduction as disclosed by Yilmazcoban and be motivated in doing so because it enhances provide a utilization method that enhances the neural network model to learn information related to the user based on information collected from online social networking platforms and the sweep learning structure in order to provide product recommendation to the user- Yilmazcoban abstract. 
 
 
Conclusion

The prior art made of record and not relied upon is considered pertinent to applicant's disclosure. Microsoft computer dictionary fifth edition gives definition of “live”

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MUDASIRU K OLAEGBE whose telephone number is (571)272-2082. The examiner can normally be reached MON-FRI. 7.30AM-5.30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Farid Homayounmehr can be reached on 5712723739. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.





/MUDASIRU K OLAEGBE/Examiner, Art Unit 2495                                                                                                                                                                                                        

/FARID HOMAYOUNMEHR/Supervisory Patent Examiner, Art Unit 2495