DETAILED ACTION
This communication is in respond to applicant’s amendments filed on September 21, 2022. Claims 1, 4-11, and 14-22, 24 and 25 are pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's arguments filed on 09/21/2022 have been fully considered but they are not persuasive for the following reasons:
Applicant’s Argument:
“It is maintained that two of the references forming part of this rejection are not analogous art, namely Chechik and Porter and that non-analogous art should not form part of the obviousness rejection (and thus deflating the position that the previously proffered arguments cannot show obviousness by attacking references individually) . 
It is also respectfully submitted that the office action fails to take into account the correct standard when making the analogous art rejection as clarified in Donner Technology, LLC v. Pro Stage Gear, LLC, No. 2020-1104, slip op. at 2 (Fed. Cir. Nov. 9, 2020). It appears that the office action concedes that the two references are not in the same field of endeavor as the claimed subject matter given that the comments on page 6 of the office action. In the Donner case, the Federal Circuit held that when addressing the analogous art inquiry under a reasonable- pertinence theory, the problems to which both relate must be identified and compared. Despite the conclusions on page 6 of the office action, both of Chechik and Porter are not reasonably pertinent to the problem addressed in the claims. In particular, the identification of malicious software code (which can form parts of documents) is different than the suggestion of queries to enter into a search engine based on documents visited by a user as in Chechik. Further with regard to Porter, context-based utterance recognition to assist with automated speech recognition (ASR).  
The Federal Circuit in Donner stated that the reasonable-pertinence analysis must be carried out through the lens of a PHOSITA (person having ordinary skill in the art) who is considering turning to art outside her field of endeavor. Id. at 10. Here, the PHOSITA in cybersecurity seeking to implement a system for detecting malicious files would not have accessed search engine technology nor ASR technology as alleged absent an inventive step (thus support non-obviousness) and thus the Porter and Chechik references should not form part of the non-obviousness rejection.” (Applicant’s response filed on 09/21/2022, pages 10-11).
Examiner’s Response:
The examiner respectfully disagrees. As explained by examiner’s response in office action mailed on 08/30/2022, primary reference Moskovitch disclosed using n-gram for detecting malicious op-code, while Porter disclosed how n-gram can be specifically implemented, i.e., using n-gram for parsing and analyzing text (porter, e.g., col. 6, lines 57-62), which is pertinent to the problem of determining whether a file is malicious based on n-gram processing of sequential data, as the instant application is trying to solve. Similarly, the primary reference Moskovitch disclosed using n-gram for detecting malicious op-code, Moskovitch further disclosed that bag of words has been known to be used for represent textual file as vector space model (Moskovitch, section 3.1), note that malicious op-code is just a feature being identified using n-gram analysis. Chechik disclosed using n-gram sequences for identifying document content features and disclosed in detail implementing generating weight feature vector using bag of words approach (Chechik, col. 8, line 62 – col. 9, line 1, “the system extracts features that correspond to the vocabulary 420. A document can be represented as a term vector using a conventional bag of words approach, in a vector that may include words or n-gram sequences of words, terms, characters or other selected tokens. Again, the term vector can be binary, feature counts or weighted feature counts”), i.e., feature extraction based on n-gram analysis; therefore, Chechik is analogous art as Moskovitch as they both address feature extraction based on n-gram analysis, and Chechik is clearly pertinent to the problem the claimed invention is trying to solve. 

Claim Rejections - 35 USC § 112
The following is a quotation of the first paragraph of 35 U.S.C. 112(a):
(a) IN GENERAL.—The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor or joint inventor of carrying out the invention.

The following is a quotation of the first paragraph of pre-AIA  35 U.S.C. 112:
The specification shall contain a written description of the invention, and of the manner and process of making and using it, in such full, clear, concise, and exact terms as to enable any person skilled in the art to which it pertains, or with which it is most nearly connected, to make and use the same, and shall set forth the best mode contemplated by the inventor of carrying out his invention.

Claims 11, 14-18 and 25 are rejected under 35 U.S.C. 112(a) or 35 U.S.C. 112 (pre-AIA ), first paragraph, as failing to comply with the written description requirement. The claim(s) contains subject matter which was not described in the specification in such a way as to reasonably convey to one skilled in the relevant art that the inventor or a joint inventor, or for applications subject to pre-AIA  35 U.S.C. 112, the inventor(s), at the time the application was filed, had possession of the claimed invention. 
Amended claim 11 recites the limitation “the file being a portable executable file, the discrete tokens comprising both C++ tokens and JAVASCRIPT tokens”, which does not appear to be supported by the original specification. As recited in claim 11, the discrete tokens are extracted from a single file, the specification does not appear to disclose a portable executable file contain discrete tokens comprising both C++ tokens and JAVASCRIPT tokens.
Claim 25 is rejected under the same rationale as claim 11, as the specification does not disclose a portable executable file contains each of (i) Nullsoft scriptable install system (NSIS) opcodes, (ii) C++ tokens, and (iii) JAVASCRIPT tokens.
The dependent claims included in the statement of rejection but not specifically addressed in the body of the rejection have inherited the deficiencies of their parent claim and have not resolved the deficiencies. Therefore, they are rejected based on the same rationale as applied to their parent claims above.

The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 11, 14-18 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim the subject matter which the inventor or a joint inventor (or for applications subject to pre-AIA  35 U.S.C. 112, the applicant), regards as the invention.
Amended claim 11 recites the limitation “the file being a portable executable file, the discrete tokens comprising both C++ tokens and JAVASCRIPT tokens”, the scope of this limitation is not clear. It is not clear how a portable executable can have both C++ and JAVASCRIPT opcodes and can still be executable. For the following rejection, this limitation is interpreted as “the file being a portable executable file, the discrete tokens comprising one of 
Claim 25 is rejected under the same rationale as claim 11, as it is not clear how a portable executable file contains each of (i) Nullsoft scriptable install system (NSIS) opcodes, (ii) C++ tokens, and (iii) JAVASCRIPT tokens and can still be executable. For the following rejection, this limitation is interpreted as one of (i) Nullsoft scriptable install system (NSIS) opcodes, (ii) C++ tokens, and (iii) JAVASCRIPT tokens.
The dependent claims included in the statement of rejection but not specifically addressed in the body of the rejection have inherited the deficiencies of their parent claim and have not resolved the deficiencies. Therefore, they are rejected based on the same rationale as applied to their parent claims above.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 4-11, 14-22, and 24-25 are rejected under 35 U.S.C. 103 as being unpatentable over “Unknown Malcode Detection Using OPCODE Representation” to Moskovitch et al. (Conference paper from EuroISI 2008, LNCS 5396, pp.204-215, 2008, hereinafter Moskovitch) in view of US Pat. No. 9633653 B1 to Porter (hereinafter Porter), US Pat. No. 9,594,851 B1 to Chechik (hereinafter Chechik), and US Pat. No. 10,133,865 B1 to Feinman et al. (hereinafter Feinman).
As per claim 1, Moskovitch disclosed a method for protecting a device from a malicious file, the method being implemented by one or more data processors forming part of at least one computing device (Moskovitch, Abstract, method for detection of unknown malicious code based on OpCode evaluation using text categorization concepts, one or more data processors is implicitly disclosed as the process of executable files clearly involves computer processor or processors) and comprising: 
extracting from the file, by at least one data processor from a plurality of entry points in the file, sequential data comprising discrete tokens (Moskovitch, page 209, lines 5-18, streamlining executable (i.e. file) by extracting sequences of OpCodes “in the same logical order in which the OpCodes appear in the executable”, here OpCodes correspond to the claimed “tokens”, in light of Applicant’s specification, par 0024, tokens are interpreted as units of code / instructions; each location corresponding to each extracted sequence of OpCode corresponds to an entry point), the file being a portable executable file (Moskovitch, page 208, section 3.2, “...The files in the benign set, including executable and DLL (Dynamic Linked Library) files, were gathered from machines running the Windows XP operating system on our campus.”, portable executable are formats used in executable and DLL files), the discrete tokens being opcodes associated with malware including two or more different types of tokens (Moskovitch, page 204, Abstract, and page 207, section 2.2, detection of malicious code is based on analysis of extracted Opcodes, i.e., the opcodes are associated with malware, opcodes in executable have different types of operations. Examiner’s Node: Portable Executable is also disclosed by Feinman as detailed below); 
generating, by at least one data processor, n-grams of the discrete tokens (Moskovitch, p. 209, lines 24-28, extract OpCode-n-grams); 
generating, by at least one data processor, a vector of weights based on respective frequencies of the n-grams (Moskovitch, p. 209, lines 24-28, extract OpCode-n-grams and generates TF, and p. 208, section 3.1, vector of terms created “such that each index in the vector represents the term frequency (TF), and used for term weighting); 
determining, by at least one data processor and based on a statistical analysis of the vector of weights, that the file is likely to be malicious (Moskovitch, p. 211, section 5.3.1, classification algorithms used are statistical analysis);
Moskovitch does not explicitly disclose the discrete tokens respectively comprise syllables of machine language instructions within the operation code; however, in an analogous art in n-gram document analysis, Porter disclosed the concept that n-gram analysis may be based on a variety of sizes such as syllables (Porter, col. 5, lines 48-56, n-gram items may be words, phonemes, syllables, letters, or base pairs); it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention, to modify the system of Moskovitch to further incorporate the implementation of n-gram items based on different sizes such as syllables or letters as disclosed by Porter, in order to identify a model that yields best classification result;
Moskovitch does not explicitly disclose generating weight vector using a bag of words algorithm; however, in an analogous art in n-gram analysis, Chechik disclosed generating weight feature vector using bag of words approach (Chechik, col. 8, line 62 – col. 9, line 1, “the system extracts features that correspond to the vocabulary 420. A document can be represented as a term vector using a conventional bag of words approach, in a vector that may include words or n-gram sequences of words, terms, characters or other selected tokens. Again, the term vector can be binary, feature counts or weighted feature counts”); it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention, to modify the system of Moskovitch to incorporate the generating of feature vector using bag of words approach as disclosed by Chechik, in order to obtain weighted n-gram feature vector suitable for machine learning, because a bag of word approach is known conventional term vector feature extraction as disclosed by Chechik (Chechik, col. 8, line 62 – col. 9, line 1);
Moskovitch further does not explicitly disclose initiating, by at least one data processor and responsive to determining that the file is likely to be malicious, a corrective action; however, in an analogous art in malware detection, Feinman disclosed detecting malware based on derived opcode-n-grams from opcode sequences, and initiate corrective action responsive to determining a file is like to be malicious (Feinman, col. 11, lines 28-41, section action based on classifying result may include a variety of actions such as deleting or halting and/or blocking the execution of the potentially malicious program), and executable file being a portable executable file (Feinman, col. 3, lines 39-40, file with Portable Executable headers); it would have been obvious to one of ordinary skill in the art before the effective filing date of the invention, to combine the system of Moskovitch and Feinman, and implement the remediate security actions as disclosed by Feinman, in order to ensure system security by implementing security measures to protect system resource from being infected by potentially malicious files.

As per claim 4, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 1, wherein generating the vector of weights comprises: determining, by at least one data processor, a term frequency of each of the n-grams among the other n-grams (Moskovitch, p. 208, section 3.1, TF, equation (1)).

As per claim 5, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 4, wherein generating the vector of weights further comprises: determining, by at least one data processor, an inverse document frequency of each of the n-grams within a corpus (Moskovitch, p. 208, section 3.1, IDF).

As per claim 6, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 5, wherein generating the vector of weights further comprises: generating, by at least one data processor, a dot product of the term frequency and the inverse document frequency for each of the n-grams (Moskovitch, p. 208, section 3.1, TFIDF, equation (2)).

As per claim 7, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 1, wherein the statistical analysis of the vector of weights comprises performing a logistic regression on the vector of weights (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), training data using ANN involves performing a logistic regression on the vector of weights; also Feinman, Fig. 5, ref #400, 404, 406, 408, and col. 8, lines 46-60, backpropagation).

As per claim 8, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 1, wherein the statistical analysis of the vector of weights comprises inputting the vector of weights to a machine learning model (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), Decision Trees (DT), etc., e.g., inputting vector of weights is part of ANN training process; also Feinman, Fig. 5, ref #400, 404, 406, 408, and col. 8, lines 46-60).

As per claim 9, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 8, wherein the machine learning model is selected from the group consisting of generalized linear models, ordinary least squares, ridge regression, lasso, multi-task lasso, elastic net, multi-task elastic net, least angle regression, LARS lasso, orthogonal matching pursuit, Bayesian regression, naive Bayesian, logistic regression, stochastic gradient descent, neural networks, Perceptron, passive aggressive algorithms, robustness regression, Huber regression, polynomial regression, linear and quadratic discriminant analysis, kernel ridge regression, support vector machines, stochastic gradient descent, nearest neighbor, Gaussian processes, cross-decomposition, decision trees, random forest, and ensemble methods (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), Decision Trees (DT), etc.).

As per claim 10, Moskovitch-Porter-Chechik-Feinman disclosed the method of claim 1, wherein the corrective action is selected from the group consisting of quarantining the file, stopping execution of the file, notifying the user that the file likely is malicious, flagging the file, storing the file, generating a hash of the file, transmitting the file or a hash of the file, and reverting to an earlier version of the file or device software (Feinman, col. 11, lines 28-41, “the security action may include deleting the potentially malicious program, halting and/or blocking the execution of the potentially malicious program, quarantining and/or sandboxing the potentially malicious program, alerting a user, an administrator, and/or a security vendor of the potentially malicious program, removing permissions from the potentially malicious program, blocking behaviors of the potentially malicious program, and/or performing additional security scans and/or analyses of the potentially malicious program”; the reasons of obviousness have been noted in the rejection of claim 1 above and applicable herein).

Claims 11 and 14-16 recite substantially the same limitations as claims 1 and 4-6, respectively, in the form of a system implementing the corresponding method, therefore, they are rejected under the same rationale. Regarding the limitation “the discrete tokens comprising both C++ tokens and JAVASCRIPT tokens” which is interpreted as the discrete tokens comprising one of C++ tokens and JAVASCRIPT tokens, although Moskovitch does not explicitly disclose C++ tokens, however, Moskovitch disclosed that files identified being gathered from machines running Windows XP operating system (Moskovitch, page 208, section 3.2); Examiner takes official notice that C++ executable and JAVASCRIPT are known to be used in machines running Windows XP before the effective filing date of the invention, therefore, executables disclosed by Moskovitch include C++ or JAVASCRIPT files and therefore including C++ and JAVASCRIPT tokens.

As per claim 17, Moskovitch-Porter-Chechik-Feinman disclosed the system of claim 11, wherein the statistical analysis of the vector of weights comprises performing a logistic regression on the vector of weights (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), training data using ANN involves performing a logistic regression on the vector of weights; also Feinman, Fig. 5, ref #400, 404, 406, 408, and col. 8, lines 46-60, backpropagation) and inputting the vector of weights to a machine learning model (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), Decision Trees (DT), etc., e.g., inputting vector of weights is part of ANN training process; also Feinman, Fig. 5, ref #400, 404, 406, 408, and col. 8, lines 46-60).

As per claim 18, Moskovitch-Porter-Chechik-Feinman disclosed the system of claim 17, wherein the machine learning model is selected from the group consisting of generalized linear models, ordinary least squares, ridge regression, lasso, multi-task lasso, elastic net, multi-task elastic net, least angle regression, LARS lasso, orthogonal matching pursuit, Bayesian regression, naive Bayesian, logistic regression, stochastic gradient descent, neural networks, Perceptron, passive aggressive algorithms, robustness regression, Huber regression, polynomial regression, linear and quadratic discriminant analysis, kernel ridge regression, support vector machines, stochastic gradient descent, nearest neighbor, Gaussian processes, cross-decomposition, decision trees, random forest, and ensemble methods (Moskovitch, p. 210, section 3.4, Artificial Neural Networks (ANN), Decision Trees (DT), etc.).

Claims 19-22 and 24 recite substantially the same limitations as claims 1, 4-6 and 9, respectively, in the form of a system implementing the corresponding method, therefore, they are rejected under the same rationale. Regarding the limitation “opcodes associated with malware including one or more of (i) Nullsoft scriptable install system (NSIS) opcodes, (ii) C++ tokens, or (iii) JAVASCRIPT tokens”, although Moskovitch does not explicitly disclose C++ tokens, however, Moskovitch disclosed that files identified being gathered from machines running Windows XP operating system (Moskovitch, page 208, section 3.2); Examiner takes official notice that C++ executable and JAVASCRIPT are known to be used in machines running Windows XP before the effective filing date of the invention, therefore, executables disclosed by Moskovitch include C++ or JAVASCRIPT files and therefore including C++ and JAVASCRIPT tokens. 
Claim 25 is interpreted as “...discrete tokens include one of (i) Nullsoft scriptable install system (NSIS) opcodes, (ii) C++ tokens, and (iii) JAVASCRIPT tokens”, therefore, it  is rejected under the same rationale as claim 19 above.

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to Linglan Edwards whose telephone number is (571)270-5440. The examiner can normally be reached 9:00am - 5:00pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ashok B Patel can be reached on 5712723972. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/LINGLAN EDWARDS/Primary Examiner, Art Unit 2491