DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after January 31, 2018, is being examined under the first inventor to file provisions of the AIA .

Information Disclosure Statement
The information disclosure statement (IDS) submitted on 02/07/2018, 12/04/2018, 02/15/2019 and 06/16/2020 are in compliance with the provisions of 37 CFR 1.97. Accordingly, the information disclosure statements are being considered by the examiner.

Oath/Declaration
For the record, the examiner acknowledges that the Oath/Declaration submitted on 07/31/2018 has been received.

Drawings
The drawings filled on 01/31/2018 have been accepted.

Specification
The disclosure is objected to because of the following informalities:
In paragraph [1010, 1050]: "A HTML" should read "An HTML".
Appropriate correction is required.

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claim 5 is rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Claim 5 recites, "the processor configured to extract a set of features from the second
modified HTML file, the processor configured to provide an indication of the set of features from the second HTML file as a second input to the machine learning model to produce a second output". But the set of features were extracted from the second modified HTML file. For the purpose of the examination, examiner will read this "the processor configured to extract a set of features from the second modified HTML file, the processor configured to provide an indication of the set of features from the second modified HTML file as a second input to the machine learning model to produce a second output". Examiner suggests that applicant update to "the processor configured to extract a set of features from the second modified HTML file, the processor configured to provide an indication of the set of features from the second modified HTML file as a second input to the machine learning model to produce a second output".
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.
Claims 1-2 and 4 are rejected under 35 U.S.C. 103 as being unpatentable over US 2014/0090061 A1 by Avasarala et al, (hereinafter, “Ref. Avasarala”), in view of “Mal-ID: Automatic Malware Detection Using Common Segment Analysis and Meta-Features” by Tahan et al., (hereinafter, “Ref. Tahan”), further in view of US 20180129786 A1 by Khine et al.
(hereinafter, Ref. Khine).
As per claim 1, Ref. Avasarala teaches an apparatus, comprising: a memory; and a processor operatively coupled to the memory, the processor configured to (Ref Avasarala fig. 9(932, 936) and para. [0014] teach memory and processor) 
(Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML file. See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category specific classifier based on the evaluated features.” teaches that machine learning method (machine learning model) builds category-specific classifiers (a malicious content classification) for files. )
Ref. Avasarala fails to explicitly teach:
the processor configured to remove a subtree of the HTML file to define a modified HTML file, the modified HTML file having a valid HTML format, 
the processor configured to extract a set of features from the modified HTML file, 
the processor configured to provide an indication of the set of features as an input to the machine learning model to produce an output, 

the processor configured to store, in a database, an indication of the impact as associated with the subtree of the HTML file.
However Ref. Tahan teaches:
the processor configured to remove a subtree of the …file to define a modified … file, the modified …file having a valid … format, (Ref. Tahan page 955 teaches the input for the Mal-ID method is an unclassified executable file (i.e. file) of any size. See also fig. 1 and page 955 [para 1] teach about breaking the file into segments. Then detecting the file segments that originated from the development platform or from a benign third party library and then disregard those segments. Finally, the remaining segments (i.e. remove portion) would be compared to determine their degree of resemblance to a collection of known malwares. Here the “remaining segments” can be defined as the modified file. As the “remaining segments” are a subset of the entire executable file, it would have the same format associated with a type of the executable file.)  

the processor configured to extract a set of features from the modified … file, (Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy.” teaches three features will be derived (i.e. extract a set of features) from each segment, this reasonably teaches that features will be derived from the remaining segments (i.e. modified file) as well.)
(Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy” and ref. Tahan page 957(last para.): “Note that the features, as described above, are in fact meta-features as they are used to represent features of features (features of the basic 3-grams)”  teach meta features are the indication of set of features. See also page 958 (section 2.3): “(a) Line 5. Calculate the entropy for the bytes within the segment. (b) Line 6. The algorithm gets two parameters EntropyLow and EntropyHigh… (d) Line 11. Using the CFL, calculate the CFL-MFG index… (f) Line 14. Using the TFL, calculate the TFL-MFG index” teaches the meta-features are inserted as an input to the ML-ID algorithm (i.e. machine learning model). See also page 959: “Second level index aggregation—Count all segments that are found in malware and not in the CFL” teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. )
the processor configured to identify an impact of the subtree of the … file on the malicious content classification of the … file based on the output, (Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL then the file is malware; otherwise consider the file as benign. We have implemented Mal-Id with X set to 1” teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. subtree) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here Identifying an impact is to identify the rule if the portion is classified as malicious then it contributes to the entire file being classified as malicious. This reasonably teaches that the ML-ID algorithm (i.e. malicious content classification) classifies the executable file (i.e. file) as malicious based on the output from the features of the segments of the file. )
an indication of the impact as associated with the subtree of the … file
(Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL then the file is malware; otherwise consider the file as benign. We have implemented Mal-Id with X set to 1” teaches the count of segments are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. subtree) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here an indication of the impact is to identify if the portion is classified as malicious then it contributes to the entire file being classified as malicious.)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code most likely to be malicious. (Ref Tahan Page 349(abstract)).
	Combination of Avasarala and Tahan fails to explicitly teach: 
 
However Ref. Khine teaches
the processor configured to store, in a database, an indication 
(Ref. Khine page 18(claim18): “the server being in communication with the database to store the indication of the adverse respiratory event” teaches that the indication can be stored in the database.)

	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Khine’s Predictive Respiratory Monitor And System into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to access the data efficiently. (Ref. Khine para 0081, [0124]).
	As per claim 2, the combination of Ref. Avasarala, Ref Tahan and Ref. Khine as shown above teaches the apparatus of claim 1. 
Ref. Avasarala further teaches wherein the machine learning model is at least one of a neural network, a decision tree model, a random forest model or a deep neural network. (Ref. Avasarala para. [0037], [0097] teach machine learning model can be a boosted decision tree (i.e. decision tree))
	As per claim 4, the combination of Ref. Avasarala, Ref Tahan and Ref. Khine as shown above teaches the apparatus of claim 1.
(Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL then the file is malware; otherwise consider the file as benign. We have implemented Mal-Id with X set to 1” teaches a segment of the file matches a segment from the TFL library, then that portion is classified as malicious. It further teaches the count of segments are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. portion of the structured file) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here Identifying an impact is to identify the rule if the portion is classified as malicious then it contributes to the entire file being classified as malicious. This reasonably teaches that ML-ID algorithm classifies the remaining segments as malicious based on the rule (i.e. impact).
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code most likely to be malicious. (Ref Tahan Page 349(abstract)). 
Claim 3 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Khine as shown above, further in view of “Malware Classification Method via Binary Content Comparison” by Kang et al. (hereinafter Ref. Kang)

As per claim 3, the combination of Ref. Avasarala, Ref Tahan and Ref. Khine as shown above teaches the apparatus of claim 1.
The combination of the Ref. Avasarala, Ref Tahan and Ref. Khine fails to explicitly teach: 
wherein the processor is configured to identify an impact of the subtree of the HTML file on the input to the machine learning model by comparing a difference between the indication of the set of features extracted from the modified HTML file with an indication of a set of features extracted from the HTML file.
However Ref. Kang teaches:
wherein the processor is configured to identify an impact of the subtree of the… file on the input to the machine learning model (Ref. Kang page 316(section 1): (Ref. Kang fig. 1, fig. 2, page 316(section 1): “We propose a new malware classification method to reduce the overheads of malware analysis. By classifying malware correctly, the analysis time can be reduced because types of malware can be known in advance …To reduce analysis overheads, we propose MBC (Major Block Comparison) system, which identifies the core parts of binaries that can represent a family of malware. Test results with several malware showed that our approach can classify malware effectively” and Kang page 320(section 5): “we proposed a malware classification method based on major block analysis. Binary files in the same malware family have common functionalities and these common functionalities can be used to classify same malware family. In our proposed method, a binary file is divided into blocks and teach that Major Block Comparison system (i.e. machine learning model) takes binary executable files as input, breaks the file into major blocks and identifies if the file is malware. Here identifying an impact is to classify the file as malware by analyzing the major blocks (i.e. subtree) of the file)
by comparing a difference between the indication of the set of features extracted from [another]… file with an indication of a set of features extracted from the… file.
(Ref. Kang page 317 (section 3): “Our proposed method selects parts of malware, extracts binary features, and computes similarities of these feature values. Values of binary features are extracted from disassembled instructions of binary executable files” teaches that the malware classification method extract binary feature values (i.e. indication of set of features). See also ref. Kang fig.2, page 318-319 (section 3.2.4), 320(section 5) teach that the system compares the similarities between two file by comparing major block’s binary feature values (i.e. indication of set of features) of one file with major block’s binary feature values (i.e. an indication of a set of features) of another file (i.e. the structured file). This reasonably teaches that if the system can compare the similarities of features of two files, the system identifies the differences as well.)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Kang’s Malware Classification Method via Binary Content Comparison into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment and Ref. Khine’s Predictive Respiratory .

Claim 6 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Khine as shown above, further in view of US 20190095805 A1 by Tristan et al. (hereinafter Ref. Tristan)
As per claim 6, the combination of Ref. Avasarala, Ref Tahan and Ref. Khine as shown above teaches the apparatus of claim 1.
The combination of the Ref. Avasarala, Ref Tahan and Ref. Khine fails to explicitly teach: 
wherein the processor is configured to produce the indication of the set of features from the modified HTML file by providing as an input to a hash function each feature from the set of features in the modified HTML file to produce the indication of the set of features.
	However Tristan teaches:
wherein the processor is configured to produce the indication of the set of features from the … file by providing as an input to a hash function each feature from the set of features in the … file to produce the indication of the set of features (Ref Tristan page 11(claim 2): “a given decision model in the plurality of decision models is configured to: for a given feature in the input data: hash the feature using a hash function associated with the given decision model to obtain an index in the feature vector; and modify the feature vector at the index to reflect and para [0020]:” The hashing trick is a method that is used to make machine learning algorithms faster and use less memory. The hashing trick works by mapping the set of features in the input data into another set by using a hash function. The hashing thus reduces the size of the feature set.” teach that the set of features extracted from the input data (i.e. file) can be hashed using a hash function (i.e. input to a hash function). Here, hash function is the indication of the set of features) 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Tristan’s Feature Hashing Models into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment and Ref. Khine’s Predictive Respiratory Monitor And System, with a motivation to reduce the complexity of the resulting decision system (e.g., decision trees) by using the hashing technique, which reduces computing resource requirements both during training and in the field (Ref Tristan para. [0006]).
Claims 1-2 are also rejected under 35 U.S.C. 103 as being unpatentable over U.S. provisional application U.S. provision application 62/483,102 (priority document for US 20180293381 A1) by Huihsin et al. (Hereinafter Ref. Huihsin), in view of Ref. Avasarala and further in view of Ref. Khine. 
As per claim 1, Ref. Huihsin teaches an apparatus, comprising: a memory; and a
processor operatively coupled to the memory, the processor configured to (Ref. Huihsin para. [00128] teaches processor and memory)
	the processor configured to remove a subtree of the …file to define a modified … file, the modified …file having a valid … format, (Ref. Huihsin para. [0008-0010] teaches that the method analyzes the maliciousness of a file by analyzing its data block/ segments (i.e. removed subtree) separately. Here each block / data segments of the file can be defined as the modified file which will have the same format associated with the file)

the processor configured to extract a set of features from the modified … file (Ref. Huihsin para [0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file that contains the program
header, into decimal representation, generate a set of features, including n-gram, entropy
and domain, based on the decimal representation, and feed the feature representations into
a machine learning model, such as a random forest model, for prediction. The output of
this model is a label of whether the data segment is malicious or benign” teaches that the method extracts set of features from the data segment (i.e. modified file))
the processor configured to provide an indication of the set of features as an input to the machine learning model to produce an output, (Ref. Huihsin parar[0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file that contains the program header, into decimal representation, generate a set of features, including n-gram, entropy and domain, based on the decimal representation, and feed the feature representations into a machine learning model, such as a random forest model, for prediction. The output of this model is a label of whether the data segment is malicious or benign” teaches that the method feeds the feature representations (i.e. indication of the set of features) into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. )
the processor configured to identify an impact of the subtree of the … file on the malicious content classification of the … file based on the output, (Ref. Huihsin para [0010] and [0085 - 0086] teaches that the method feeds the feature into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. The malware detection engine (i.e. malicious content classification) makes a determination on whether the segment is malicious or benign using certain malware/benign probability threshold. Here identifying an impact is to identify the rule if the data segment is above the malware threshold, then it classifies the segment as malicious which it contributes to file being detected as malware. This reasonably teaches that the malware detection engine (i.e. malicious content classification) classifies the segment (i.e. subtree) of the file based on the output from the features of the segments of the file.)
an indication of the impact as associated with the subtree of the … file (Ref. Huihsin para [0010] and [0085 - 0086] teaches that the method feeds the feature into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. The malware detection engine (i.e. malicious content classification) makes a determination on whether the segment is malicious or benign using certain malware/benign probability threshold. Here identifying an impact is to identify the rule if the data segment is above the malware threshold, then it classifies the segment as malicious which it contributes to file being detected as malware.)

Ref. Huihsin fails to explicitly teach:
receive a Hypertext Markup Language (HTML) file for which a machine learning model
has made a malicious content classification, 
However Ref. Avasarala teaches:
receive a Hypertext Markup Language (HTML) file for which a machine learning model has made a malicious content classification, (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML file. See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category specific classifier based on the evaluated features.” teaches that machine learning method (machine learning model) builds category-specific classifiers (a malicious content classification) for files. )
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Huihsin’s System 
	Combination of Huihsin and Avasarala fails to explicitly teach:
the processor configured to store, in a database, an indication 
However Ref. Khine teaches
the processor configured to store, in a database, an indication 
(Ref. Khine page 18(claim18): “the server being in communication with the database to store the indication of the adverse respiratory event” teaches that the indication can be stored in the database.)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Khine’s Predictive Respiratory Monitor And System into Ref. Huihsin’s System and Method for Detecting Malware as modified by Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection, with a motivation to access the data efficiently. (Ref. Khine para 0081, [0124]).
	As per claim 2, the combination of Ref Huihsin, Ref. Avasarala and Ref. Khine as shown above teaches the apparatus of claim 1.
	 Ref Huihsin further teaches wherein the machine learning model is at least one of a neural network, a decision tree model, a random forest model or a deep neural network. (Ref. Huihsin para. [0040]: “the extracted feature representations are fed into a pre-generated machine learning (ML) model such as a random forest tree” teaches that the machine learning model can be random forest tree (i.e. random forest model))
Claims 7,8,10 and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan.
As per claim 7, Ref. Avasarala teaches A non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to: (Ref. Avasarala page 11 (claim 13) teaches non-transitory computer readable medium (i.e. A non-transitory processor-readable medium))
receive a structured file for which a machine learning model has made a malicious content classification; (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML, PDF (i.e. structured file). See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category specific classifier based on the evaluated features.” teaches that machine learning method (machine learning model) builds category-specific classifiers (a malicious content classification) for files. )
Ref. Avasarala fails to explicitly teach: 
 remove a portion of the structured file to define a modified structured file, the modified structured file following a format associated with a type of the structured file; 
extract a set of features from the modified structured file; 
provide an indication of the set of features as an input to the machine learning model to produce an output; 
and identify an impact of the portion of the structured file on the malicious content classification of the structured file based on the output.
	However Ref. Tahan teaches:
remove a portion of the structured file to define a modified structured file, the modified structured file following a format associated with a type of the structured file;  (Ref. Tahan page 955 teaches that the input for the Mal-ID method is an unclassified executable file (i.e. structured file) of any size. See also fig. 1 and page 955 [para 1] teach about breaking the file into segments. Then detecting the file segments that originated from the development platform or from a benign third party library and then disregard those segments. Finally, the remaining segments (i.e. remove portion) would be compared to determine their degree of resemblance to a collection of known malwares. Here the “remaining segments” can be defined as the modified structured file. As the “remaining segments” are a subset of the entire executable file, it would have the same format associated with a type of the executable file.)  
 (Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy.” teaches three features will be derived (i.e. extract a set of features) from each segment, this reasonably teaches that features will be derived from the remaining segments (i.e. modified structured file) as well.)
provide an indication of the set of features as an input to the machine learning model to produce an output; (Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy” and ref. Tahan page 957(last para.): “Note that the features, as described above, are in fact meta-features as they are used to represent features of features (features of the basic 3-grams)”  teach meta features are the indication of set of features. See also page 958 (section 2.3): “(a) Line 5. Calculate the entropy for the bytes within the segment. (b) Line 6. The algorithm gets two parameters EntropyLow and EntropyHigh… (d) Line 11. Using the CFL, calculate the CFL-MFG index… (f) Line 14. Using the TFL, calculate the TFL-MFG index” teaches the meta-features are inserted as an input to the ML-ID algorithm (i.e. machine learning model). See also page 959: “Second level index aggregation—Count all segments that are found in malware and not in the CFL” teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. )
	and identify an impact of the portion of the structured file on the malicious content classification of the structured file based on the output. (Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL then the file is malware; otherwise consider the file as benign. We have implemented Mal-Id with X set  teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. portion of the structured file) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here Identifying an impact is to identify the rule if the portion is classified as malicious then it contributes to the entire file being classified as malicious. This reasonably teaches that the ML-ID algorithm (i.e. malicious content classification) classifies the executable file (i.e. structured file) as malicious based on the output from the features of the segments of the file. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code most likely to be malicious. (Ref Tahan Page 349(abstract)) 

	As per claim 8, the combination of Ref. Avasarala and Ref Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
	Avasarala further teaches:
wherein the structured file is at least one of a Hypertext Markup Language (HTML) file, an Extensible Markup Language (XML) file, a Portable Executable (PE) file, a document processing file, or a Portable Document Format (PDF) file. ; (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML, PDF (i.e. structured file).)

	As per claim 10, the combination of Ref. Avasarala and Ref. Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
	Avasarala further teaches:
wherein the machine learning model is at least one of a neural network, a decision tree model, a random forest model or a deep neural network. (Ref. Avasarala para. [0037], [0097] teach machine learning model can be a boosted decision tree (i.e. decision tree))
	As per claim 13, the combination of Ref. Avasarala and Ref. Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
	Tahan further teaches:
further comprising code to cause the processor to: classify the portion of the structured file as malicious based on the impact. (Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL then the file is malware; otherwise consider the file as benign. We have implemented Mal-Id with X set to 1” teaches a segment of the file matches a segment from the TFL library, then that portion is classified as malicious. It further teaches the count of segments are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. portion of the structured file) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here Identifying an impact is to identify the rule if the portion is classified as malicious then it contributes to the entire file being classified as malicious. This reasonably teaches that ML-ID algorithm classifies the remaining segments as malicious based on the rule (i.e. impact). 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code most likely to be malicious. (Ref Tahan Page 349(abstract)) 

Claim 9 is rejected under 35 U.S.C. 103 as being unpatentable over  Ref. Avasarala, in view of Ref. Tahan as shown above, further in view of US 20150046850 A1 by Kurabayashi et al. (hereinafter Ref. Kurabayashi)
As per claim 9, the combination of Ref. Avasarala and Ref Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
Combination of Avasarala and Tahan fails to explicitly teach:
wherein the portion of structured file is a Hypertext Markup Language (HTML) subtree
or an Extensible Markup Language (XML) subtree. 
However Kurabayashi teaches:	
wherein the portion of structured file is a Hypertext Markup Language (HTML) subtree or an Extensible Markup Language (XML) subtree. (Ref. Kubrayashi para [0018, 0049] teaches that HTML documents (i.e. structured file) can be configured to extract HTML subtree. This reasonably teaches that the portion of the structured file is a HTML subtree.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Kubrayashi’s system into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to process subtrees instead of processing the entire tree (entire file), which potentially reduces processing power and time.  

Claim 11 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan as shown above, further in view of Ref. Kang.
As per claim 11, the combination of Ref. Avasarala and Ref. Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
	Combination of Ref. Avasarala and Ref. Tahan fails to explicitly teach:
further comprising code to cause the processor to: identify an impact of the portion of the structured file on the input to the machine learning model by comparing a difference between the indication of the set of features extracted from the modified structured file and an indication of a set of features extracted from the structured file.
However Ref. Kang teaches:
Identify an impact of the portion of the … file on the input to the machine learning model (Ref. Kang fig. 1, fig. 2, and page 316(section 1): “We propose a new malware classification method to reduce the overheads of malware analysis. By classifying malware and Kang page 320(section 5): “we proposed a malware classification method based on major block analysis. Binary files in the same malware family have common functionalities and these common functionalities can be used to classify same malware family. In our proposed method, a binary file is divided into blocks and similarities of two binary files are calculated using the n-gram method” teach that Major Block Comparison system (i.e. machine learning model) takes binary executable files (i.e. structured file) as input, breaks the file into major blocks and identifies if the file is malware. Here identifying an impact is to classify the file as malware by analyzing the major blocks (i.e. portion) of the file)
by comparing a difference between the indication of the set of features extracted from
[another]… file and an indication of a set of features extracted from the … file (Ref. Kang page 317 (section 3): “Our proposed method selects parts of malware, extracts binary features, and computes similarities of these feature values. Values of binary features are extracted from disassembled instructions of binary executable files” teaches that the malware classification method extract binary feature values (i.e. indication of set of features). See also ref. Kang fig.2, page 318-319 (section 3.2.4), 320(section 5) teach that the system compares the similarities between two file by comparing major block’s binary feature values (i.e. indication of set of features) of one file with major block’s binary feature values (i.e. an indication of a set of features) of another file. This reasonably teaches that if the system can compare the similarities of features of two files, the system identifies the differences as well.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Kang’s Malware Classification Method via Binary Content Comparison into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to identify whether the portion (modified file) is malicious. Such feature comparison between the modified file and the original file helps to determine the maliciousness relationship between the files. 
Claim 12 is rejected under 35 U.S.C. 103 as being unpatentable over  Ref. Avasarala, in view of Ref. Tahan as shown above, further in view of Ref. Tristan)
As per claim 12, the combination of Ref. Avasarala and Ref. Tahan as shown above teaches the non-transitory processor-readable medium of claim 7.
	Combination of Ref. Avasarala and Ref. Tahan fails to explicitly teach:
further comprising code to cause the processor to: define the indication of the set of features by providing as an input to a hash function each feature from the set of features to produce the indication of the set of features.


However Ref. Tristan teaches:
(Ref Tristan page 11(claim 2): “a given decision model in the plurality of decision models is configured to: for a given feature in the input data: hash the feature using a hash function associated with the given decision model to obtain an index in the feature vector; and modify the feature vector at the index to reflect the feature” and para [0020]:” The hashing trick is a method that is used to make machine learning algorithms faster and use less memory. The hashing trick works by mapping the set of features in the input data into another set by using a hash function. The hashing thus reduces the size of the feature set.” teach that the set of features extracted from the input data can be hashed using a hash function (i.e. input to a hash function). Here, hash function is the indication of the set of features) 
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Tristan’s Feature Hashing Models into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to reduce the complexity of the resulting decision system (e.g., decision trees) by using the hashing technique, which reduces computing resource requirements both during training and in the field (Ref Tristan para. [0006]).
Claims 7-8 and 10 are also rejected under 35 U.S.C. 103 as being unpatentable over Ref. Huihsin, in view of Ref. Avasarala.
As per claim 7, Huihsin teaches a non-transitory processor-readable medium storing code representing instructions to be executed by a processor, the code comprising code to cause the processor to: (Ref. Huihsin para. [00128] teaches non-transitory medium and processor)
remove a portion of the structured file to define a modified structured file, the modified structured file following a format associated with a type of the structured file; (Ref. Huihsin para. [0008-0010] teaches that the method analyzes the maliciousness of a file by analyzing its data block/ segments (i.e. removed portion of the structured file) separately. Here each block / data segments of the file can be defined as the modified structured file which will have the same format associated with the file.)
extract a set of features from the modified structured file; (Ref. Huihsin para [0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file that contains the program
header, into decimal representation, generate a set of features, including n-gram, entropy
and domain, based on the decimal representation, and feed the feature representations into
a machine learning model, such as a random forest model, for prediction. The output of
this model is a label of whether the data segment is malicious or benign” teaches that the method extracts set of features from the data segment (i.e. modified file))

provide an indication of the set of features as an input to the machine learning model to produce an output; (Ref. Huihsin parar[0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file teaches that the method feeds the feature representations (i.e. indication of the set of features) into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. )
and identify an impact of the portion of the structured file on the malicious content classification of the structured file based on the output. (Ref. Huihsin para [0010] and [0085 - 0086] teaches that the method feeds the feature into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. The malware detection engine (i.e. malicious content classification) makes a determination on whether the segment is malicious or benign using certain malware/benign probability threshold. Here identifying an impact is to identify the rule if the data segment is above the malware threshold, then it classifies the segment as malicious which it contributes to file being detected as malware. This reasonably teaches that the malware detection engine (i.e. malicious content classification) classifies the segment (i.e. subtree) of the file based on the output from the features of the segments of the file.)
Ref. Huihsin fails to explicitly teach:
receive a structured file for which a machine learning model has made a malicious content classification, 
However Ref. Avasarala teaches:
(Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML file. See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category specific classifier based on the evaluated features.” teaches that machine learning method (i.e. machine learning model) builds category-specific classifiers (i.e. a malicious content classification) for files. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Huihsin’s System and Method for Detecting Malware into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection, with a motivation to take mitigation actions (such as blocking the file) in the system if a machine learning model classifies the file as malicious, thereby it saves the computer system from security threat of the malicious file. (Ref. Avasarala para [0109-0112])

As per claim 8, the combination of Ref Huihsin and Ref. Avasarala as shown above teaches the non-transitory processor-readable medium of claim 7.
	Ref. Huihsin further teaches:
wherein the structured file is at least one of a Hypertext Markup Language (HTML) file, an Extensible Markup Language (XML) file, a Portable Executable (PE) file, a document processing file, or a Portable Document Format (PDF) file. ; (Ref. Huihsin para. [0092]: “During the collection step, both malicious and benign file samples are collected. These file samples can include Windows Portable Executables (PE), Dynamic Loaded Libraries (DLL), Microsoft Office documents such as Word, Excel, and PowerPoint, Adobe PDF files, HTML files, JavaScript files, Java Archive Gar) files, and others.” teaches the structured file can be HTML file.) 
As per claim 10, the combination of Ref Huihsin and Ref. Avasarala as shown above teaches the non-transitory processor-readable medium of claim 7.
Ref Huihsin further teaches wherein the machine learning model is at least one of a neural network, a decision tree model, a random forest model or a deep neural network. (Ref. Huihsin para. [0040]: “the extracted feature representations are fed into a pre-generated machine learning (ML) model such as a random forest tree” teaches that the machine learning model can be random forest tree (i.e. random forest model))
Claims 14-15 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan, further in view of Rank vs. order by Cheplyaka et al.
(hereinafter, Ref. Cheplyaka).  
As per claim 14, Ref. Avasarala teaches A method, comprising: identifying, using a processor of a malicious content detection device, a set of structured portions within a (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML, PDF (i.e. structured file). Here set of structured portions within a structured file could be the entire structured file. See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features present in the training files in the selected category of training files, evaluating the identified features to determine the identified features most effective at distinguishing between malign and benign files, and building a category specific classifier based on the evaluated features.” teaches that machine learning method (machine learning model) builds category-specific classifiers (a malicious content classification) for files.)
	Avasarala fails to explicitly teach: 
for each structured portion from the set of structured portions: 
removing, using the processor, that structured portion from the structured file to define a modified structured file; 
extracting, using the processor, a set of features from the modified structured file; providing, using the processor, an indication of the set of features as an input to the
machine learning model to produce an output; 

and storing, using the processor, an indication of the impact in a vector; 
and24161793937 v3New U.S. Patent Application Attorney Docket No.: INVI-021/OOUS 314067-2052ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions.
	However Tahan teaches:
for each structured portion from the set of structured portions: (Ref. Tahan page 958 (sec. 2.3): “Divide file F into S segments of length L. All segments are inserted into a collection…For each segment in the collection…” teaches that Mal-ID method analyzes each segments (i.e. structured portions) from the set of segments (i.e. set of structured portions) of that file)
removing, using the processor, that structured portion from the structured file to define a modified structured file; (Ref. Tahan page 955 teaches that the input for the Mal-ID method is an unclassified executable file (i.e. structured file) of any size. See also fig. 1 and page 955 [para 1] teach about breaking the file into segments. Then detecting the file segments that originated from the development platform or from a benign third party library and then disregard those segments. Finally, the remaining segments (i.e. removing structured portion) would be compared to determine their degree of resemblance to a collection of known malwares. Here the “remaining segments” can be defined as the modified structured file.)
(Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy.” teaches three features will be derived (i.e. extracting a set of features) from each segment, this reasonably teaches that features will be derived from the remaining segments (i.e. modified structured file) as well.)
providing, using the processor, an indication of the set of features as an input to the machine learning model to produce an output; (Ref. Tahan page 957 (section 2.2): “Three features can be derived for each segment: Spread, MFG, and Entropy” and ref. Tahan page 957(last para.): “Note that the features, as described above, are in fact meta-features as they are used to represent features of features (features of the basic 3-grams)”  teach meta features are the indication of set of features. See also page 958 (section 2.3): “(a) Line 5. Calculate the entropy for the bytes within the segment. (b) Line 6. The algorithm gets two parameters EntropyLow and EntropyHigh… (d) Line 11. Using the CFL, calculate the CFL-MFG index… (f) Line 14. Using the TFL, calculate the TFL-MFG index” teaches the meta-features are inserted as an input to the ML-ID algorithm (i.e. machine learning model). See also page 959: “Second level index aggregation—Count all segments that are found in malware and not in the CFL” teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. )
identifying, using the processor, an impact of that structured portion on the malicious content classification of the structured file based on the output; (Ref. Tahan page 959: “3. Lines 28-30. Second level index aggregation—Count all segments that are found in malware … Classify—If there are at least X segments found in the malware train set (TFL) and not in the CFL  teaches the count of segments (i.e. output) are found in the malware and not in the CFL is the output of the ML-ID algorithm. If the remaining segments (i.e. portion of the structured file) contain more than a present threshold number of segments from the TFL library, then the file is classified as malware. Here Identifying an impact is to identify the rule if the portion is classified as malicious then it contributes to the entire file being classified as malicious. This reasonably teaches that the ML-ID algorithm (i.e. malicious content classification) classifies the executable file (i.e. structured file) as malicious based on the output from the features of the segments of the file. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code most likely to be malicious. (Ref Tahan Page 349(abstract))  
The combination of the Ref. Avasarala and Ref Tahan fails to explicitly teach: 
and storing, using the processor, an indication of the impact in a vector; 
and24161793937 v3New U.S. Patent Application Attorney Docket No.: INVI-021/OOUS 314067-2052ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions.
However Cheplyaka teaches:
 (Here broadest reasonable interpretation of an indication of the impact could be a numbering, based on how malicious (impact) the portion is. If the impact is highly malicious then the indication can be a high number and if the impact is low then the indication can be lower number. Ref. Cheplyaka page 1 teaches that vector can store element such as integer number in the vector and using the rank function we can rank the vector’s element (i.e. indication of the impact). 
and ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions. (Ref. Cheplyaka page 1 teaches that vector can store element such as integer number in the vector and using the rank function we can rank the vector’s element (i.e. indication of the impact). For instance, rank(c(10,30,20,40)) - each number in the vector represents the indication of the impact of the each structured portion from the set of structured portions. After ranking based on the vector and the number / vector elements (i.e. indication of the impact), output would be – 1,3,2,4. )
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Cheplyaka’s Ranking into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to efficiently and quickly access the indication. Ranked indication makes it easier to identify which portions of the file are most malicious.
As per claim 15, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.	
Avasarala further teaches:
wherein the structured file is at least one of a Hypertext Markup Language (HTML) file, an Extensible Markup Language (XML) file, a Portable Executable (PE) file, a document processing file, or a Portable Document Format (PDF) file. ; (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML, PDF (i.e. structured file).) 

As per claim 18, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.
Avasarala further teaches: 
wherein the identifying the impact of that structured portion includes comparing the output associated with [a]… file with an output of the machine learning model associated with the structured file. (Ref. Avasarala para. [0043] teaches that the system can create a malware classifier (i.e. the machine learning model) by executing malign files and observe the execution static data (i.e. output). Then the system can execute an unknown file and then compare the execution state data of the unknown file with the execution static data of malign files. If such comparisons show similarities or matches, this fact may be used to provide greater confidence that the unknown file is malign. This reasonably teaches that the system can compare the malware classifier’s execution static data (i.e. output) of two files. Here identifying the impact is to classify the unknown file malign/malware by analyzing the execution static data (i.e. output).) 
Ref. Avasarala doesn’t explicitly teach the modified structured file
However Ref Tahan teaches the modified structured file (Ref. Tahan page 955 teaches that the input for the Mal-ID method is an unclassified executable file (i.e. structured file) of any size. See also fig. 1 and page 955 [para 1] teach about breaking the file into segments. Then detecting the file segments that originated from the development platform or from a benign third party library and then disregard those segments. Finally, the remaining segments (i.e. removing structured portion) would be compared to determine their degree of resemblance to a collection of known malwares. Here the “remaining segments” can be defined as the modified structured file.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection into Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to save time and processing power by focusing specifically on those segments of code (modified structured file) most likely to be malicious. (Ref Tahan Page 349(abstract))  



Claim 16 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Cheplyaka as shown above, further in view of Ref. Kurabayashi. 
As per claim 16, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.
Combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka fails to explicitly teach:
wherein the portion of structured file is a Hypertext Markup Language (HTML) subtree
or an Extensible Markup Language (XML) subtree. 
However Kurabayashi teaches:	
wherein the portion of structured file is a Hypertext Markup Language (HTML) subtree or an Extensible Markup Language (XML) subtree. (Ref. Kubrayashi para [0018, 0049] teaches that HTML documents (i.e. structured file) can be configured to extract HTML subtree. This reasonably teaches that the portion of the structured file is a HTML subtree.)
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Kubrayashi’s system into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment, with a motivation to process subtrees instead of processing the entire tree (entire file), which potentially reduces processing power and time.  

Claim 17 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Cheplyaka as shown above, further in view of US 8635700 B2 by Richard et al. (hereinafter Ref. Richard). 
As per claim 17, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.
Combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka fails to explicitly teach:
identifying, based on the ranking, at least one structured portion from the set of structured portions as a malicious structured portion.
However Ref. Richard teaches:
identifying, based on the ranking, at least one structured portion from the set of structured portions as a malicious structured portion. (Ref Richard col 1 (line 20-45) and col 8 (line 25 -45) teaches that analyzing plurality of portions (i.e. the set of the structured portion) of file and rank the portions based on the analysis. Based on the ranking, it determines which portions (i.e. structured portion) are malicious and which leads the file being malicious).  
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Richard’s Detecting malware using stored patterns into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment and Ref. Cheplyaka’s Ranking, with a motivation to determine effectively which portions from the plurality of the portions cause the file to be malicious. A portion having a higher ranking have more effect of the entire file being malicious than a lower ranked portion. If a file contains high ranked portion comparing to a threshold, remedial action (such as removing that portion from the file or further analysis) might be taken. Thus using the ranking of the portions, one of ordinary skill in the art can effectively identify 
Claim 19 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Cheplyaka as shown above, further in view of Ref. Kang.
As per claim 19, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.
Ref. Tahan further teaches for each structured portion from the set of structured portions: (Ref. Tahan page 958 (sec. 2.3): “Divide file F into S segments of length L. All segments are inserted into a collection…For each segment in the collection…” teaches that Mal-ID method analyzes each segments (i.e. structured portions) from the set of segments (i.e. set of structured portions) of that file)
Combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka fails to explicitly teach:
calculating a difference between the indication of the set of features and an indication
of a set of features extracted from the structured file.
	However Ref. Kang teaches:
calculating a difference between the indication of the set of features and an indication of a set of features extracted from the structured file. (Ref. Kang page 317 (section 3): “Our proposed method selects parts of malware, extracts binary features, and computes similarities of these feature values. Values of binary features are extracted from disassembled instructions of binary executable files” teaches that the malware classification method extract binary feature values (i.e. indication of set of features) from the file. See also ref. Kang fig.2, page 318-319 (section 3.2.4), 320(section 5) teach that the system compares the similarities between two files by comparing major block’s binary feature values (i.e. the indication of set of features) of one file with major block’s binary feature values (i.e. an indication of a set of features) of another file (i.e. the structured file). This reasonably teaches that if the system can compare the similarities of features of two files, the system identifies the differences between the features as well)
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Kang’s Malware Classification Method via Binary Content Comparison into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment and Ref. Cheplyaka’s Ranking, with a motivation to with a motivation to identify whether the portion (modified file) is malicious. Such feature comparison between the modified file and the original file helps to determine the maliciousness relationship between the files.

Claim 20 is rejected under 35 U.S.C. 103 as being unpatentable over Ref. Avasarala, in view of Ref. Tahan and Ref Cheplyaka as shown above, further in view of Ref. Tristan.
As per claim 20, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.
Combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka fails to explicitly teach:

However Ref. Tristan teaches: 
(Ref Tristan page 11(claim 2): “a given decision model in the plurality of decision models is configured to: for a given feature in the input data: hash the feature using a hash function associated with the given decision model to obtain an index in the feature vector; and modify the feature vector at the index to reflect the feature” and para [0020]:” The hashing trick is a method that is used to make machine learning algorithms faster and use less memory. The hashing trick works by mapping the set of features in the input data into another set by using a hash function. The hashing thus reduces the size of the feature set.” teach that the set of features extracted from the input data (i.e. file) can be hashed using a hash function (i.e. input to a hash function). Here, hash function is the indication of the set of features) 
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Tristan’s Feature Hashing Models into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection as modified by Ref. Tahan’s Automatic Malware Detection Using Common Segment and Ref. Cheplyaka’s Ranking, with a motivation to reduce the complexity of the resulting decision system (e.g., decision trees) by using the hashing technique, which reduces computing resource requirements both during training and in the field (Ref Tristan para. [0006]).
Claims 14 and 15 are also rejected under 35 U.S.C. 103 as being unpatentable over Ref. Huihsin), in view of Ref. Avasarala, further in view of Rank vs. order by Cheplyaka et al.
(hereinafter, Ref. Cheplyaka).  
	As per claim 14, Ref. Huihsin teaches
for each structured portion from the set of structured portions: ( Ref. Huihsin para. [0008,0010] teaches that the method analyze maliciousness of file by analyzing its data block/ segments separately. See also para [0009]: “If the method cannot be certain that the data contains malicious content, it will request the next block of data and then perform the same analysis on the new data to determine maliciousness. This process is repeated until the end of the file is reached.” teaches that the method analyzes each data segments (i.e. structured portions) until it finds any malicious block. Here the set of structured portions is the analyzed data segments)
removing, using the processor, that structured portion from the structured file to define a modified structured file; (Ref. Huihsin para. [0008-0010] teaches that the method analyze maliciousness of file by analyzing its data block/ segments (i.e. removed portion of the structured file) separately. Here each block / data segments of the file can be defined as the modified structured file which will have the same format associated with the file.)

extracting, using the processor, a set of features from the modified structured file; (Ref. Huihsin para [0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file that contains the program header, into decimal representation, generate a set of features, including n-gram, entropy and teaches that the method extracts set of features from the data segment (i.e. modified file))
 providing, using the processor, an indication of the set of features as an input to the
machine learning model to produce an output; (Ref. Huihsin para [0010]: “To analyze each data segment, this method will turn each data segment, e.g., the first data segment of the Windows Portable Executable file that contains the program header, into decimal representation, generate a set of features, including n-gram, entropy and domain, based on the decimal representation, and feed the feature representations into a machine learning model, such as a random forest model, for prediction. The output of this model is a label of whether the data segment is malicious or benign” teaches that the method feeds the feature representations (i.e. indication of the set of features) into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. )
identifying, using the processor, an impact of that structured portion on the malicious content classification of the structured file based on the output; (Ref. Huihsin para [0010] and [0085 - 0086] teaches that the method feeds the feature into a machine learning model to generate output. The output of this model is a label of whether the data segment is malicious or benign. The malware detection engine (i.e. malicious content classification) makes a determination on whether the segment is malicious or benign using certain malware/benign probability threshold. Here identifying an impact is to identify the rule if the data segment is above the malware threshold, then it classifies the segment as malicious which it contributes to file being detected as malware. This reasonably teaches that the malware detection engine (i.e. malicious content classification) classifies the segment (i.e. subtree) of the file based on the output from the features of the segments of the file.)
Ref. Huihsin fails to explicitly teach:
receive a structured file for which a machine learning model has made a malicious content classification, 
and storing, using the processor, an indication of the impact in a vector; 
and ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions.
However Ref. Avasarala teaches:
A method, comprising: identifying, using a processor of a malicious content detection device, a set of structured portions within a structured file for which a machine learning model has made a malicious content classification; (Ref. Avasarala fig. 7a teaches that machine learning model can take different kinds of files as input such as HTML file. See also, ref Avasarala abstract: “Improved systems and methods for automated machine learning, zero-day malware detection. Embodiments include a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories, and trains category-specific classifiers that distinguish between malign and benign files in a category of files. The training may include selecting one of the plurality of categories of training files, identifying features teaches that machine learning method (i.e. machine learning model) builds category-specific classifiers (i.e. a malicious content classification) for files. )
Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Huihsin’s System and Method for Detecting Malware into Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection, with a motivation to take mitigation actions (such as blocking the file) in the system if a machine learning model classifies the file as malicious, thereby it saves the computer system from security threat of the malicious file. (Ref. Avasarala para [0109-0112])

The combination of the Ref. Huihsin and Ref. Avasarala fails to explicitly teach: 
and storing, using the processor, an indication of the impact in a vector; 
and24161793937 v3New U.S. Patent Application Attorney Docket No.: INVI-021/OOUS 314067-2052ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions.
However Cheplyaka teaches:
and storing, using the processor, an indication of the impact in a vector; (Here broadest reasonable interpretation of an indication of the impact could be a numbering, based on how malicious (impact) the portion is. If the impact is highly malicious then the indication can be a high number and if the impact is low then the indication can be lower number. Ref. Cheplyaka page 1 teaches that vector can store element such as integer number in the vector and using the rank function we can rank the vector’s element (i.e. indication of the impact). 
and ranking, using the processor and based on the vector, the set of structured portions based on the indication of the impact of each structured portion from the set of structured portions. (Ref. Cheplyaka page 1 teaches that vector can store element such as integer number in the vector and using the rank function we can rank the vector’s element (i.e. indication of the impact). For instance, rank(c(10,30,20,40)) - each number in the vector represents the indication of the impact of the each structured portion from the set of structured portions. After ranking based on the vector and the number / vector elements (i.e. indication of the impact), output would be – 1,3,2,4. )
	Therefore, it would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to combine the teachings of Ref. Cheplyaka’s Ranking into Ref. Huihsin’s System and Method for Detecting Malware as modified by Ref. Avasarala’s Method For Automated Machine-Learning, Zero-Day Malware Detection, with a motivation to efficiently and quickly access the indication. Ranked indication makes it easier to identify which portions of the file are most malicious.

As per claim 15, the combination of Ref. Avasarala, Ref Tahan and Ref. Cheplyaka as shown above teaches the method of claim 14.	

Ref. Huihsin further teaches:
wherein the structured file is at least one of a Hypertext Markup Language (HTML) file, an Extensible Markup Language (XML) file, a Portable Executable (PE) file, a document processing file, or a Portable Document Format (PDF) file. ; (Ref. Huihsin para. [0092]: “During the collection step, both malicious and benign file samples are collected. These file samples can include Windows Portable Executables (PE), Dynamic Loaded Libraries (DLL), Microsoft Office documents such as Word, Excel, and PowerPoint, Adobe PDF files, HTML files, JavaScript files, Java Archive Gar) files, and others.” teaches the structured file can be HTML file.) 

Allowable Subject Matter
Claim 5 would be allowable if rewritten to overcome the rejection(s) under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), 2nd paragraph, set forth in this Office action and to include all of the limitations of the base claim and any intervening claims.

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant's disclosure.
Caspi (US 10193902 B1) teaches method for detecting a malware by using a malware
detector comprising a deep learning algorithm, which comprises converting a file into a vector.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to RAIA N M AZAD whose telephone number is (571)272-8232.  The examiner can normally be reached on 8.30 -5.30 (Mon -Thurs and 2nd Fri of the Pay week).
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kamran Afshar can be reached on 571-272-7796.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.


/RAIA N M AZAD/Examiner, Art Unit 2125                                                                                                                                                                                                        
/KAMRAN AFSHAR/Supervisory Patent Examiner, Art Unit 2125