DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Election/Restrictions
Applicant’s election without traverse of Invention I in the reply filed on 03/16/2022 is acknowledged.
Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 6-11, 15 and 21-23 are rejected under 35 U.S.C. 103 as being unpatentable over Kolter et al. (“Learning to Detect Malicious Executables in the Wild”) in view of Jin et al. (“A Scalable Search Index for Binary Files”).
Regarding claims 6 and 21, Kolter discloses a method and corresponding software product, the method comprising:
	generating a data structure that specifies byte sequences from a corpus of files (Section 3 – Data Collection);
	creating training data using pre-featured data from the data structure, the pre-featured data including features in a string portion of the pre-featured data (Section 3 – Data Collection); and 
	training a machine learning (ML) model using the training data to generate a trained ML model (i.e., We used the n-grams extracted from executables to form training examples by viewing each n-gram as a binary attribute that either present or absent from the executable) (Section 4 – Classification Methodology), wherein the ML model includes a first feature associated with a first weight and a second feature associated with a second weight (i.e., For the jth n-gram or the ith executable, the method computes the weight wij) (Section 4.2 – The TFIDF Classifier).
	Kolter does not disclose utilizing an inverted index. Jin discloses utilizing an inverted index that specifies byte sequences from a corpus of files (Abstract; Introduction). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kolter’s method to utilize an inverted index, as taught by Jin.  The motivation for doing so would have been to reduce storage costs with reasonable performance.
Regarding claims 7 and 22, Kolter further discloses applying the trained ML model to search a test sample; and determining that a decision value for the trained ML model is within a confidence interval (Section 5 – Experimental Design).  Accordingly, the test sample is part of the inverted index.
Regarding claim 8, accordingly the combined method of Kolter and Jin would lead to applying the trained ML model to search the inverted index.
Regarding claim 9, Kolter further discloses returning the search result (Section 6 – Experimental Results). Kolter does not disclose validating the search results with the trained ML model. Jin discloses validating search results (i.e., the total search time is the sum of the index query time and the time to verify the results) (Section VI-A – Design Decisions). ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kolter’s method further to validate the search results, as taught by Jin, to provide more accurate results.
Regarding claim 10, accordingly the combined method of Kolter and Jin would lead to expressing the trained ML model as a query language to automatically perform a search query on the inverted index since a query is required to search the inverted index (Jin, page 98, Querying).
Regarding claim 11, Kolter further discloses that the decision value is based at least in part on a ratio of false positives to total searches (Figures 1 & 2).
Regarding claim 15, Kolter further discloses that the corpus of files includes a corpus of malware files (Section 3 – Data Collection).
Regarding claim 23, accordingly the combined method of Kolter and Jin would lead to applying the trained ML model to search the inverted index. Kolter further discloses returning the search result (Section 6 – Experimental Results). Kolter does not disclose validating the search results with the trained ML model. Jin discloses validating search results (i.e., the total search time is the sum of the index query time and the time to verify the results) (Section VI-A – Design Decisions). ). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify Kolter’s method further to validate the search results, as taught by Jin, to provide more accurate results.
Claims 12-14 and 24 are rejected under 35 U.S.C. 103 as being unpatentable over Kolter in view of Jin as applied to claims 11 and 22 above, and further in view of Tseng et al. (US 2018/0293381 A1) and Huang et al. (US 2020/0210575 A1).
Regarding claims 12-13, Kolter and Jin do not disclose determining that the decision value is outside of a second confidence interval; and training the trained ML model using the training data and the false positives to generate a second trained ML model and repeating the training for the second trained ML model until the decision value meets is within the second confidence interval.
	Tseng discloses determining that the decision value is outside of a second confidence interval; training the trained ML model using the training data to generate a second trained ML model and repeating the training for the second trained ML model until the decision value meets is within the second confidence interval (Claim 16). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined method of Kolter and Jin to determine that the decision value is outside of a second confidence interval; train the trained ML model using the training data to generate a second trained ML model and repeat the training for the second trained ML model until the decision value meets is within the second confidence interval, as taught by Tseng, to reduce the false positive rate to a desirable rate.
	Tseng does not disclose using the false positives as part of training the trained ML model. Huang discloses using false positives as part of training a trained ML model (para. [0033]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined method of Kolter and Jin to use the false positives as part of training the trained ML model, as taught by Huang, to improve accuracy.
Regarding claim 14, Tseng do not explicitly disclose repeating the training includes adjusting at least one of the first weight or the second weight; however, this feature is deemed inherent in order to reduce the false positive rate to a desirable rate.
Regarding claim 24, Kolter further discloses that the decision value is based at least in part on a ratio of false positives to total searches (Figures 1 & 2). Kolter and Jin do not disclose determining that the decision value is outside of a second confidence interval; and training the trained ML model using the training data and the false positives to generate a second trained ML model and repeating the training for the second trained ML model until the decision value meets is within the second confidence interval.
	Tseng discloses determining that the decision value is outside of a second confidence interval; training the trained ML model using the training data to generate a second trained ML model and repeating the training for the second trained ML model until the decision value meets is within the second confidence interval (Claim 16). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined method of Kolter and Jin to determine that the decision value is outside of a second confidence interval; train the trained ML model using the training data to generate a second trained ML model and repeat the training for the second trained ML model until the decision value meets is within the second confidence interval, as taught by Tseng, to reduce the false positive rate to a desirable rate.
	Tseng does not disclose using the false positives as part of training the trained ML model. Huang discloses using false positives as part of training a trained ML model (para. [0033]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined method of Kolter and Jin to use the false positives as part of training the trained ML model, as taught by Huang, to improve accuracy.
Claim 25 is rejected under 35 U.S.C. 103 as being unpatentable over Kolter in view of Jin as applied to claim 21 above, and further in view of Kraft et al. (US 6,633,867 B1). Kolter further discloses adding one or more new files to the corpus of files (i.e., Users interact with MECS through a command line. They can add new executables to the collection) (page 471, left column, 3rd paragraph). Accordingly the new files will be added to the inverted index. Jin further discloses that the inverted index specifies byte sequences of a fixed length; and determining a plurality of byte sequences of the fixed length, the plurality of byte sequences corresponding to a set of search strings (page 95, Querying). Accordingly, the trained ML model and the inverted index agree on a specific n-gram(s). Kolter does not disclose initiating a search query in response to addition of one or more new files. Kraft discloses initiating a search query in response to addition of one or more new files (i.e., the search result is updated automatically in almost real-time, when new information arrives) (Abstract). ]). It would have been obvious to one of ordinary skill in the art before the effective filing date of the claimed invention to modify the combined method of Kolter and Jin to initiate a search query in response to addition of one or more new files, as taught by Kraft, so that search results could be updated automatically in response to new files being added.
Allowable Subject Matter
Claims 1-5 are allowed over the prior art of record.

Any inquiry concerning this communication or earlier communications from the examiner should be directed to MINH DINH whose telephone number is (571)272-3802. The examiner can normally be reached Mon-Fri: 9 AM - 5:30 PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Jeffrey Nickerson can be reached on 469-295-9235. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/MINH DINH/Primary Examiner, Art Unit 2432