Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .





Detailed Action

Abstract Objection

The Abstract is objected to because the language does not comply with MPEP § 608.01(b). Applicant is reminded of the proper language and format for an abstract of the disclosure.
The abstract should be in narrative form and generally limited to a single paragraph on a separate sheet within the range of 50 to 150 words in length. The abstract should describe the disclosure sufficiently to assist readers in deciding whether there is a need for consulting the full patent text for details.
The language should be clear and concise and should not repeat information given in the title. It should avoid using phrases which can be implied, such as, “The disclosure concerns,” “The disclosure defined by this invention,” “The disclosure describes,” etc.  In addition, the form and legal phraseology often used in patent claims, such as “means” and “said,” should be avoided. See MPEP § 608.01(b) for guidelines for the preparation of patent abstracts.



Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows: 
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefore, subject to the conditions and requirements of this title.

1. 	Claims 1 – 3 are rejected under 35 U.S.C. 101 because the claimed invention is directed to non-statutory subject matter.

The independent claim recites a “malicious file detection technology…”; however, Applicant’s claimed technology is not limited to statutory embodiments and as a result is construed to be entirely software. As a result, Applicant’s claimed language is directed toward software and software is not a statutory category. 
The Examiner suggests Applicant to review the recent memorandum entitled “Subject matter Eligibility of Computer readable media” issued on January 26, 2010 from the Under Secretary of Commerce for Intellectual Property and Director of the United State Patent and Trademark Office , David J. Kappos (http://www.uspto.gov/patents/law/notices/101_crm_20100127.pdf) and consider amending the claim to clearly limit the application to be embodied on a memory/disk.
Any amendment to the claim should be commensurate with its corresponding disclosure.

Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  



Claims 1 - 3 are rejected under 35 U.S.C. 103 as being unpatentable over Ducau (US Pub. No.  2020/0364338) in view of Huang (US Pub. No. 2020/0210575 A1).

Per claim 21, Ducau suggests a malicious file detection technology based on a random forest algorithm, wherein the technology comprises the steps of constructing 9 types of behavior features by collecting behavior information (reads on extracting any suitable characteristics/values/features associated with the file and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037. The Examiner asserts it would have been obvious to one of ordinary skill in the art to construct 9 type of behavior features based on Ducau’s teaching of extracting any suitable characteristics/values/features because any finite number is reasonably scoped as any suitable characteristics/values/features as taught by Ducau) such as file information, network information, registry information and process information (reads on extracting any suitable characteristics/values/features associated with the file and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037) of a malicious file (reads on one or more input files categorized as malicious, see Ducau para 0034 and 0046) and a normal file (reads on one or more input files categorized as benign, see Ducau para 0034 and 0046) to form a feature vector (reads on the feature vector may be formed from the extracted file features, see Ducau para 0034, 0037 and 0048); the feature vector serves as input data of a machine learning algorithm (reads on the machine learning model may be configured to receive a feature vector associated with the file, see Ducau para 0038), a random forest of an integrated algorithm is selected (the machine learning model may be a random forest model, see Ducau para 0038), and a supervised detection model is established (reads on the random forest model receives a feature vector and outputs an analysis result, see Ducau para 0038); and when behavior data of a new file is generated (reads on a target file is processed by a feature extractor and file features are generated and processed by the trained model, see Ducau para 0029 and 0049), the model can accurately and effectively identify whether the file is malicious or not (reads on detecting malware in a target file that is processed by a feature extractor and a trained model, see Ducau para 0029 and 0049).  The prior art of record is silent on explicitly stating collecting behavior information of a file in a sandbox to form a feature vector.
[0005] In general, in an aspect, a method for machine learning recognition of artifacts as malware may include providing training data comprising features of artifacts and an attribute indicator for the artifacts, the attribute indicator comprising a type of artifact, training a machine learning model using the training data to detect malware, and using the trained machine learning model to recognize malware by providing features of an artifact as input and providing both a threat score and an attribute indicator of the type of artifact as output. 
[0026] All documents mentioned herein are hereby incorporated by reference in their entirety. References to items in the singular should be understood to include items in the plural, and vice versa, unless explicitly stated otherwise or clear from the text. Grammatical conjunctions are intended to express any and all disjunctive and conjunctive combinations of conjoined clauses, sentences, words, and the like, unless otherwise stated or clear from the context. Thus, the term “or” should generally be understood to mean “and/or” and so forth.
[0029] In some implementations, a machine learning model, such as a neural network or other suitable model, may be trained for a security recognition task using training data. Security recognition tasks may include but are not limited to the recognition of malware or other security threat, suspiciousness, behavior detection, or any other relevant analysis result. For example, the security recognition tasks may include detection of malware or a determination of a threat score. The object of recognition tasks may be any suitable artifact, for example, files (e.g., Portable Executable (PE) files), documents, processes, network flows, memory extracts, or any other suitable analysis object. Recognition tasks may be applied, for example, to features determined by static analysis, dynamic analysis, behavior analysis, activity analysis, or any other suitable features. In addition to features of an object of analysis, context information also may be included in training data for improved performance. In various implementations, contextual information may include an attribute indicator that may indicate a family or type of malware. The use of the attribute indicator improves the performance of machine learning recognition tasks and provides information that may be used to better understand and address the identified malware.
[0033] The processor 110 may include a feature extractor 112, and a machine learning model 114. Each of the feature extractor 112 and the machine learning model 114 may be implemented as software stored in memory 120 and executed by processor 110 (e.g., code to cause the processor 110 to execute the feature extractor 112 and the machine learning model 114 may be stored in the memory 120) and/or a hardware-based device such as, for example, an ASIC, an FPGA, a CPLD, a PLA, a PLC, an IC, and/or the like.
[0034] The feature extractor 112 may be configured to receive an artifact as an analysis object (e.g., one or more of a file, a memory image, a network stream, behavior information, etc.) as an input and output a feature vector associated with the analysis object. In other words, the feature extractor 112 may extract features from the analysis object and form a feature vector including indications of these features. For example, in some exemplary implementations in which the analysis object is an executable file or script, the feature extractor 112 may identify static features in a file (for example, headers, variable definitions, routines, sub-routines, strings, elements, subtrees, tags, and/or the like). A representation of these features may be used to define a feature vector. For example, in some implementations, the feature extractor 112 may normalize each feature and/or input each feature to a hash function to produce a hash value. The feature extractor 112, using the hash values, may form a feature vector (e.g., of pre-determined length and/or of variable length). For example, the hash value of each feature may identify a position and/or bucket in the feature vector and a value at that position and/or bucket in the feature vector may be incremented each time a hash value for a feature identifies that position and/or bucket. As another example, in other implementations, a value associated with that feature may be included in the feature vector at that position and/or bucket. In some instances, the positions and/or buckets to which each feature can potentially hash may be determined based on the length and/or size of that feature. For example, strings having a length within a first range can potentially hash to a first set of positions and/or buckets while strings having a length within a second range can potentially hash to a second set of positions and/or buckets. The resulting feature vector may be indicative of the features of the structured file.
[0035] For example, the feature extractor 112 may receive a PE file and identify features within that file (e.g., strings, elements, subtrees, tags, function calls, etc.). The feature extractor 112 may then provide each feature as an input to a hash function to generate a hash value for that feature. The feature extractor 112 may use the hash values to form a feature vector representative of and/or indicative of the features in the file. Similar to a PE file, the feature extractor 112 may receive a HTML file, an XML file, or a document file, and identify features (e.g., strings, elements, subtrees, tags, function calls, etc.) within that file. The feature vector may be provided as an input to the machine learning model 114.

[0036] In various implementations, any suitable processes, characteristics and/or values can be used to define the feature vector and/or set of values associated with the file. For example, in some implementations, the feature extractor 112 may hash or map n-grams or n-gram representations to the same feature vector. In some implementations, the feature extractor 112 may hash or map n-grams of representations to a portion and/or buckets within a feature vector. In some implementations, the feature extractor 112 may be configured to hash one or more n-gram representations to portions of the feature vector.

[0037] In some implementations, the feature vector may be formed from extracted features based on a lookup table, a data map, an associative array, and/or any other data structure and/or function. Such a function can be used instead of or in addition to a hash function. For another example, any other data extracted and/or calculated from the file such as string length values associated with strings within the file, a variance of string length values associated with strings within the file, informational entropy values associated with the file (e.g., calculated based on a frequency of byte values, sequences and/or patterns within one or more byte windows of the file), byte values within the file, values computed based on byte values within the file (e.g., byte value ranges within the file, a standard deviation associated with byte values in the file, etc.) a length of the file, an author of the file, a publisher of the file, a compilation date of the file, data pertaining to whether a valid signature is included with the file, other information that can be parsed from a Portable Executable (PE) file (including but not limited to the size of the header and/or the size of components of the file, such as image sizes and/or the size of the code, versions of operating systems configured to run and/or open the file, section names, entry points, symbol table information, and/or similar information), images and/or representation of images associated with the file, and/or the like, can be used to define the feature vector and/or set of values associated with the file. Additional detail regarding such data extracted and/or calculated from the file can be found in U.S. patent application Ser. No. 15/228,728 filed Aug. 4, 2016 and titled “Methods and Apparatus for Machine Learning Based Malware Detection, now U.S. Pat. No. 9,690,938, and U.S. patent application Ser. No. 15/343,844 filed Nov. 4, 2016 and titled “Methods and Apparatus for Detecting Malware Samples with Similar Image Sets,” now U.S. Pat. No. 9,672,358, each of which is incorporated herein by reference in its entirety.

[0038] The machine learning model 114 may be any suitable type of machine learning model such as, for example, a neural network, a decision tree model, a gradient boosted tree model, a random forest model, a deep neural network, or other suitable model. The machine learning model 114 may be configured to receive a feature vector associated with an analysis object, and output an analysis result, such as a score indicating whether the analysis object is, for example, potentially malicious, and a descriptive indictor, such as a family or type of malware. The machine learning model may provide an output indicating a threat classification. The threat classification may indicate an evaluation of the likelihood that the analysis object is a threat. For example, the threat classification may classify an analysis object into different categories such as, for example, benign, potentially malicious, malicious, type of malicious content/activity, class of malicious content/activity, malware family and/or the like.

[0046] Referring to FIG. 2, a machine learning training engine 200 may include training data 206. Training data 206 may include data used to train a detection model 202. In some instances, training data 206 can include multiple sets of data. Each set of data may contain at least one set of input information and an associated desired output value or label, and typically includes a large number of sets. The input information may include analysis objects and descriptive information for the analysis objects. In some implementations, the training data may include input files pre-categorized into categories such as, for example, malicious files and benign files. In some implementations, the training data may include input files with associated threat scores. In some implementations, the training data may include descriptive information, such as a family or type of malware. In some implementations, the input information may include feature vectors for files and context information for the files, such as attribute indicators for the files. In some implementations, the input information may include threat scores for files.

[0047] The training data 206 may be used to train the detection model 202 to perform security recognition tasks.

[0048] Referring to FIG. 3, an exemplary embodiment 300 is shown. In model training 322, a training data database 322 of binary files includes associated detection names. An exemplary binary file 324 is processed 326 by an exemplary feature extractor to generate file features 328. A detection name or names 332 associated with the binary file 324 may be distilled 334 to provide tags (labels) 336. The file features 328 and the tags 336 are provided to a learning algorithm 330 to generate a trained model. It should be understood that there may be a plurality of features extracted, a plurality of distillation processes and a variety or plurality of labels or tags provided to the learning algorithm 330.

[0049] In model deployment 340, a target binary file 342 is processed by a feature extractor and file features 348 are generated. The features are processed by the trained model 350, and the trained model is then capable of detecting threats and also providing information about the type of threat that has been detected. This model may address an information gap between conventional machine learning and signature-based detection methods by providing a machine-learning based tagging model that generates human-interpretable semantic descriptions of malicious software (e.g., file-infector, coinminer). These descriptions provide potentially more useful and flexible information than malware family names.

[0174] It should further be appreciated that the methods above are provided by way of example. Absent an explicit indication to the contrary, the disclosed steps may be modified, supplemented, omitted, and/or re-ordered without departing from the scope of this disclosure.


Huang suggests 
collecting behavior information of a file (reads on observing features including behavior and action aspects of each program in the sandbox, see Huang para 0019, 0020,0027, 0035 and Figure 4) in a sandbox (see Huang Figure 4 block 106 and para 0019 – 0020) to form a feature vector (reads on forming a feature representation of the features observed in the sandbox, see Huang para 0019, 0020, 0027, 0035 and Figure 4 block 114).

[0019] To determine the features of each program 102 to be classified by the example adversarial malware detector 100, the adversarial malware detector 100 of FIG. 1 includes an example sandbox 106, and an example feature extractor 108. The example sandbox 106 of FIG. 1 is a testing environment in which the execution, operation and processes of each program 102 is not affected by other running programs (e.g., machine executable instructions executing on a processor that implements the adversarial malware detector 100), and do not affect other running programs. The example sandbox 106 allows suspicious software and files potentially containing malware or malicious code to be safely observed, tested, evaluated, etc.
[0020] The example feature extractor 108 of FIG. 1 identifies the features of each program 102 as the program 102 executes in the sandbox 106. The feature extractor 108 observes features that are useful to a machine learning engine 110 for classifying the program 102 using, for example, static analysis, dynamic analysis, etc. Example observed features include aspects of the program 102, such as behaviors, actions, function calls, API calls, data accesses, URL calls, etc. FIG. 2 is an example log file containing a listing of example features 200 identified by the example feature extractor 108 for a program of the DREBIN test set having a SHA5 value of 00c8de6b31090c32b65f8c30d7227488d2bce5353b31bedf5461419ff463072d. [0026] As shown in FIG. 4, the coefficients of the classification model 120 implemented by the example machine learning engine 110 can be trained using supervised learning and a set of programs having known classifications (e.g., the DREBIN test set 402). The DREBIN test set has 41129 benign programs 404, and 1870 malware programs 406. Each program of the DREBIN test set 402 is processed through the sandbox 106, the feature extractor 108, and the feature vector former 114 to form a feature representation 112. Each feature representation 112 is passed through the machine learning engine 110, and resultant classifications made by the under training machine learning engine 110 are compared to—a known classification 408 corresponding to the feature representation 112. The machine learning engine 110 uses, for example, backpropagation, to update the coefficients of the classification model 120 based on whether the classification probabilities 116, 118 correspond with the known classifications 408.

[0027] During use of the adversarial malware detector 100 to classify programs 102, programs 102 are processed through the sandbox 106, the feature extractor 108, and the feature vector former 114 to form a feature representation 112. Each feature representation 112 is input to the machine learning engine 110 to obtain the classification probabilities 116, 118. The decider 122 classifies the associated program 102 based on its classification probabilities 116, 118. Once trained, the DREBIN test set 402 can be classified by the machine learning engine 110 implementing the model 120 and its results tabulated, as shown in the example table 500 of FIG. 5. As shown, the example machine learning engine 110 of FIG. 1 can be used to classify programs 102 of the DREBIN test set 402 with approximately ninety-nine percent (99%) accuracy.

[0035] While an example manner of implementing the adversarial malware detector 100 is illustrated in FIG. 1, one or more of the elements, processes and/or devices illustrated in FIG. 4 may be combined, divided, re-arranged, omitted, eliminated and/or implemented in any other way. Further, the example collector 104, the example sandbox 106, the example feature extractor 108, the example machine learning engine 110, the example feature vector former 114, the example decider 122, the example feature perturber 128, the controller 130 and/or, more generally, the example adversarial malware detector 100 of FIG. 1 may be implemented by hardware, software, firmware and/or any combination of hardware, software and/or firmware. Thus, for example, any of the example collector 104, the example sandbox 106, the example feature extractor 108, the example machine learning engine 110, the example feature vector former 114, the example decider 122, the example feature perturber 128, the controller 130 and/or, more generally, the example adversarial malware detector 100 of FIG. 1 could be implemented by one or more analog or digital circuit(s), logic circuits, programmable processor(s), programmable controller(s), graphics processing unit(s) (GPU(s)), digital signal processor(s) (DSP(s)), application specific integrated circuit(s) (ASIC(s)), programmable logic device(s) (PLD(s)), field programmable logic device(s) (FPLD(s)), and/or field programmable gate array(s) (FPGA(s)). When reading any of the apparatus or system claims of this patent to cover a purely software and/or firmware implementation, at least one of the example collector 104, the example sandbox 106, the example feature extractor 108, the example machine learning engine 110, the example feature vector former 114, the example decider 122, the example feature perturber 128, the controller 130 and/or, the example adversarial malware detector 100 is/are hereby expressly defined to include a non-transitory computer-readable storage device or storage disk such as a memory, a digital versatile disk (DVD), a compact disc (CD), a Blu-ray disk, etc. including the software and/or firmware. Further still, the example adversarial malware detector 100 may include one or more elements, processes and/or devices in addition to, or instead of, those illustrated in FIG. 1, and/or may include more than one of any or all of the illustrated elements, processes and devices. As used herein, the phrase “in communication,” including variations thereof, encompasses direct communication and/or indirect communication through one or more intermediary components, and does not require direct physical (e.g., wired) communication and/or constant communication, but rather additionally includes selective communication at periodic intervals, scheduled intervals, aperiodic intervals, and/or one-time events.


    PNG
    media_image1.png
    350
    863
    media_image1.png
    Greyscale



Before the effective filing date of the invention it would have been obvious to one of ordinary skill in the art to modify the behavior collection teachings of the prior art of record (reads on extracting any suitable characteristics/values/features associated with the file and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037) by integrating the behavior collection of Huang (reads on forming a feature representation/vector of the features, including behavior and action aspects of each program, observed in the sandbox, see Huang para 0019, 0020, 0027, 0035 and Figure 4 block 114) to realize the instant limitation. One or more of the underpinning rational(s), as discussed in KSR international Co, v, Teleflex inc,s etai,s 550 U,S. 398 (2007) U.S.P.Q.2d 1385, also see MPEP § 2141 {IN), are used to support this conclusion of obviousness. Accordingly, one of ordinary skill in the art would have recognized that applying the known technique of Huang would have yielded predictable results and resulted in an improved system where features are extracted in a sandbox that does not harm any other aspect of the system (see Huang para 0019) and because the level of ordinary skill in the art demonstrated by the references applied shows the ability to incorporate such behavior collection features into similar systems, resulting in an improved system that utilizes g all available known in the art techniques to make the collection.

Per claim 2, the prior art of record further suggests wherein constructing and installing a sandbox module (The Examiner construes this to be a necessary, if not obvious, limitation of the disclosure of the prior art of record, because one of ordinary skill in the art would know that in order for the sandbox of the prior art to be used it must be first constructed and installed in some manner, see Huang Figure 4 block 106), collecting all behavior information generated by (reads on extracting any suitable characteristics/values/features associated with the file while being observed in the sandbox and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037 and see Huang para 0019, 0020,0027, 0035 and Figure 4) the malicious sample (reads on the combination of one or more input files categorized as malicious, see Ducau para 0034 and 0046 and malware programs/files, see Huang Figure 4 block 406 and para 0026) and the normal sample (reads on the combination of one or more input files categorized as benign, see Ducau para 0034 and 0046 and benign programs/files, see Huang Figure 4 block 404 and para 0026) in the sandbox (see Huang Figure 4 block 106 and para 0019 – 0020), and processing the information into 9 types of behavior feature vectors (reads on extracting any suitable characteristics/values/features associated with the file while being observed in the sandbox and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037 and see Huang para 0019, 0020,0027, 0035 and Figure 4. The Examiner asserts it would have been obvious to one of ordinary skill in the art to calculate any number of types of behavior features based on Ducau’s teaching of extracting any suitable characteristics/values/features because any finite number is reasonably scoped as any suitable characteristics/values/features as taught by Ducau) to serve as a training sample feature vector (reads on representation of these features are used to define a feature vector for the training data of a machine learning model, see Ducau para 0005, 0034 – 0035 and 0047 – 0048).
Per claim 3, the prior art of record further suggests wherein inputting (reads on the machine learning model may be configured to receive a feature vector associated with the file, see Ducau para 0038) the processed training sample feature vector (reads on the feature vector may be formed from the extracted file features, see Ducau para 0034, 0037 and 0048) to the random forest algorithm (the machine learning model may be a random forest model, see Ducau para 0038), learning a supervised classifier (reads on detecting malware in a target file that is processed by a feature extractor and a trained model, see Ducau para 0029 and 0049), and calculating 9 types of behavior features of  (reads on extracting any suitable characteristics/values/features associated with the file and determined by behavior analysis, activity analysis, dynamic analysis, static analysis or another suitable manner from one or more input files, see Ducau para 0005, 0029, 0034 – 0037. The Examiner asserts it would have been obvious to one of ordinary skill in the art to calculate any number of types of behavior features based on Ducau’s teaching of extracting any suitable characteristics/values/features because any finite number is reasonably scoped as any suitable characteristics/values/features as taught by Ducau) a to-be-detected sample to construct a to-be-detected feature vector (reads on a target file is processed by a feature extractor and file features are generated and processed by the trained model, see Ducau para 0029 and 0049).



Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Brian Shaw whose telephone number is (571)270-5191.  The examiner can normally be reached on Mon-Thurs from 6:00 AM-3:30 PM.
If attempts to reach the examiner by telephone are unsuccessful, the examiner's Supervisor, Jorge L. Ortiz Criado can be reached on (571) 272-7624.  The fax phone number for the organization where this application or proceeding is assigned is 703-872-9306.  Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see http://pair-direct.uspto.gov. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free).
/BRIAN F SHAW/Primary Examiner, Art Unit 2496