DETAILED ACTION
The applicant’s request for continued examination regarding application number 15/873,673, filed January 17, 2018 has been entered.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status. 

Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection. Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114. Applicant's submission filed on August 25, 2021 has been entered. 

Response to Amendments




The amendment filed August 25, 2021 (referencing the applicant’s Reply to Office Action filed July 20, 2021) has been entered. Examiner acknowledges receipt of Amendments to Application 15/873,673, which include: Amendments to the Claims pp.2-8, and Remarks pp.9-17 (containing applicant’s amendments). 
Regarding applicant’s amendments to the Claims on pp.2-8, examiner has acknowledged Claims 1, 4, 12, and 15 have been amended. Examiner has acknowledged 
Regarding applicant’s Remarks on p.9, examiner acknowledges applicant’s cancellation of Claims 9 and 20 containing the §112(a) lack of written description issues, and as such, the respective §112(a) rejections previously set forth in the Final Office Action mailed June 11, 2021 for Claims 9 and 20 are withdrawn. 

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 15/873,673, which include: Remarks pp.9-17 (containing applicant’s arguments). 
Applicant’s arguments on pp.9-13 under 35 U.S.C 101 have been fully considered and they are not persuasive. The existing U.S.C. 35 §101 rejections are still maintained, with the inclusion of the amended claim language according to the applicant’s amended claims provided in the sections indicated below. 
Regarding applicant’s Remarks on pp.11-12:
“The current subject matter is directed to computer-implemented techniques for creating training sets for machine learning models which uses a dictionary-based technique for deduplicating samples that are useful in connection with the identification of malicious code. These, machine learning models, when trained, are configured to identify the presence of malicious code in one or more data samples. It is submitted that this type of subject matter, namely the specifically recited techniques for generating a training set for use in training a machine learning model directed to malicious code identification, is clearly not a mental process in that it is clearly linked with a technology environment and not something a human being can practically perform in his or her mind. Moreover, the rejection also fails to adequately place the weight on complex operations which take place such as in claim 3 which relates to a complex technique for feature reduction, namely random projection of a sparse vector. As such, at least on this basis, the claimed subject matter should be not be characterized as a mental process according to page 7 of the October Update). As provided by the October Update, a claim withNAI- 1518943355v1 11 Attorney's Docket No.: 14216-075-999limitation(s) that cannot practically be performed in the human mind does not recite a mental process. Therefore, the current subject matter fails to meet prong one of step A of the 2019 Revised Guidance.”
Examiner has fully considered this argument, and has found the applicant’s arguments to be not persuasive. Examiner notes that the applicant’s prior recited claims failed to actively claim that the generation of training sets are specifically targeted to identify (or classify) malicious code in one or more data samples, and hence applicant’s argument that the prior §101 analysis fails to address those non-existent claim limitations is not relevant. 
Applicant is reminded that the U.S.C. 35 §101 analysis for applicant’s prior Claims 1-20 have identified certain claim limitations as reciting abstract ideas (i.e., mental steps/processes implementable in the human mind) according to guidance found in the latest version of the MPEP (June 2020 R-10.2019), which incorporates the guidance documented in the 2019 PEG. As indicated in the prior office action, the fact that a claim limitation recites an abstract idea (i.e., mental step/process implementable in the human mind, which includes mathematical calculations, mathematical relationships, as well as observations, judgments, evaluations, opinions) that requires the use of a generic computer as a tool to perform the mental step/judicial exception does not automatically exclude that claim limitation from being a mental step/judicial exception. See MPEP 2106.04(a)(2)(III-C). The prior claim amendments were recited to broadly apply to the generation of training sets, which involved extracting features, generating feature vectors and their respective binary representations, and performing reduced dimensional techniques to reduce the number of feature vectors to a representative set. As for the alleged “complex” feature reduction techniques, applicant fails to explain how the “feature reduction techniques” recited in the prior claims are complex such that they cannot be performed in the human mind. According to the prior §101 analysis, the technique of 
Regarding applicant’s Remarks on pp.12-13:
“The subject matter recited in the claims is applied in a meaningful way beyond generally linking the use of the judicial exception to a particular technological environment. The pendingNAI- 1518943355v1 12 claims provide advanced techniques for generating training sets which in turn are used to train machine learning models for identifying the presence of malicious code in one or more data samples. 
By reducing duplicates (i.e., identical, similar, substantially similar samples), the computing resources required to train a model are substantially reduced. Such a problem is particularly important in connection with the detection of malware (which has been emphasized with the current amendments) given the large number of identical / similar files that are potentially used to train machine learning models. The arrangements recited in the claims comprise a meaningful advance over the alleged judicial exception given the complexity of generating machine learning model training data sets. Additionally, the claimed steps cannot even be practically performed by a human. Thus, using a computer to implement the claimed approach is a practical means that is meaningful for achieving the steps as described. Even if the claimed subject matter were directed to an abstract idea, it is implemented in a practical application (e.g., the use of a computer to achieve something unachievable by a human). 
For at least these reasons, claims 1-20 cannot be performed by a human and are integrated into a practical application of the alleged judicial exception, passing prong 2 of the step 2A analysis. Being integrated into a practical application of any exception, the claims are not directed to a judicial exception. Therefore, claims qualify as patent eligible subject matter under 35 U.S.C. § 101.”
Examiner has fully considered this argument, and has found the applicant’s arguments to be not persuasive. 
Examiner notes that the applicant’s prior recited claims failed to actively claim that the generation of training sets are specifically targeted to identify (or classify) malicious code in one or more data samples, and hence applicant’s argument that the prior §101 analysis fails to address those non-existent claim limitations is not relevant. 
The prior Claims 1 and 12 recite machine learning models, but they are indicated in an intended use/field-of-use context, and are not affirmatively recited as being actively trained by the generated training sets. For example, the existing limitation found in prior Claim 1 of “creating a training set for use in training a machine learning model …” recites an intended use capable of training a machine learning model. This phrasing in the prior claims does not demonstrate that the training set is fully integrated with the rest of the invention to actively train a machine learning model. 
Similar issues exist with the pending claim limitations provided in the recent amendment. The pending claim limitations found in Claim 1 of “receiving a plurality of samples at a processing node that each comprise one or more files, the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, …” and “creating a training set for use in training a machine learning model … the machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples” both recite intended use/field-of-use and hence does not demonstrate an integration of the claim limitation with a practical application. In the former claim limitation, the samples are actively received at a processing node (interpreted as a generic computer component), with the samples capable of training a machine learning model to perform an identification of malicious code, but there is no direct integration of the samples with a training set, and no active recitation of the training set being provided as input into a machine learning model to directly produce a classification or prediction of malicious code. According to MPEP 2106.05(h), field-of-use/general linking to a technological environment claim limitations do not demonstrate an integration to a practical application. In the latter claim limitation, the training (of a machine learning model) is described within a contingent clause “when trained using the training set”. This contingent clause in a method claim effectively renders the subsequent claim language (“the machine learning model … being configured to identify the presence of malicious code in one or more data samples”) to not be performed because this condition precedent “when trained using the training set” is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Due to the presence of these above issues, these pending claim limitations 
As indicated in the prior office action, the number of training data being analyzed and de-duplicated is not a limiting factor as to deciding whether an abstract idea/mental step is implementable in a human mind, as the cited mental steps/processes can still be iteratively performed within a human mind. In general, a person having ordinary skill in the art would still be able to mentally formulate and generate a plurality of training data sets by removing duplicates for a simple machine learning model via pen and paper (even if it takes a long time to perform that mental process), with each sub-step within that formulation and generation comprising a series of mental steps/processes involving observations, judgments, evaluations, or opinions, or mathematical calculations. A person having ordinary skill in the art may decide to use a generic computer as a tool to help perform one or more claim limitations that were identified as mental sub-steps/processes to help facilitate or speed-up the mental process, but that does not automatically exclude those claim limitations being identified as mental steps. According to MPEP 2106.05(f), claims that merely recite the implementation of mental steps through instructions executed by a processor (or reciting computer hardware configured to perform operations that comprise of mental steps) are directed towards instructions to apply the mental step/judicial exception, and hence do not further integrate into a practical application, or add significantly more than the judicial exception, alone or in combination with other claim elements. Hence, the existing §101 rejections for the associated claims involving duplicate identification are still maintained.
Applicant's arguments regarding examiner’s 35 U.S.C §103 rejections have been fully considered but they are not persuasive. Hence the existing U.S.C. 35 §103 rejections are still maintained, with the inclusion of the amended claim language according to the applicant’s amended claims provided in the sections indicated below.  
Regarding applicant’s Remarks on p.15:
It is respectfully submitted that the skilled artisan would not have resulted in the subject matter recited in the claims as currently presented using the art of record. In particular, none of the references suggest the specifically recited operations directed to malicious code identification in the current claims. As previously noted, the Schreter reference, which is directed to database management describes a dictionary which maps values in a database table to value IDs such that each unique value in the dictionary is associated with one unique value ID. No suggestion is made herein that binary representations of reduced dimension vectors that include features directed to malicious code identification are added to the dictionary if not already present within such dictionary. Such an arrangement differs and is more complex than the mapping of values in a database table with that of value IDS as in Schreter.”
Examiner has fully considered this argument, and has found the applicant’s arguments to be not persuasive. The Schreter reference teaches the concept of comparing elements and adding unique elements into a dictionary structure. The Schreter reference is not used to teach binary representations of reduced dimension vectors (which was taught in the Soni reference), and as indicated earlier, applicant’s prior recited claims did not recite that the features present in the reduced dimension vectors are directed to malicious code, and hence applicant’s argument that the prior §101 analysis fails to address those non-existent claim limitations is not relevant. 
Regarding applicant’s argument regarding the complexity of binary representations versus the dictionary index representation used in the Schreter reference, applicant fails to indicate how an arrangement of binary representations stored in dictionary elements are more complex and different than an arrangement of dictionary indices (representing the dictionary vector) stored in the dictionary elements in Schreter. As indicated in the prior office action, storing and retrieving information in memory is a well-known, understood, routine, and conventional activity in computer science. Paragraph [0019] of the applicant’s specification defines a dictionary structure as “A dictionary structure can be an organized data structure that can include a unique entry for each binary representation and that can allow addition of further unique entries corresponding to new binary representations as well as modification and/or deletion of the entries.” Based on this definition, and under its broadest reasonable interpretation, the identified binary representations are mere types of values stored within an element in a dictionary structure (which is interpreted as a data structure having a key_identifier and an associated value), and the comparison, adding, and removal operations are performed on elements of a dictionary structure, and the flow of these operations are not changed due to different value types being stored in a dictionary element. On a computer system, any value type can be treated as a binary representation, as digits and characters can be expressed in terms of hexadecimal or binary digits at the data storage level and processor level. The Schreter reference uses the dictionary index as the value representing the actual dictionary vector, with the dictionary index being a different type value (i.e., either a hash map or a tree-based map based on the actual dictionary value). A person having ordinary skill in the art would be able to identify ways of performing comparison of values stored between two elements based on the value types being used in the dictionary elements at the programmatic level (i.e., performing direct digit value comparison, string text comparison, hash map or tree-based map comparison), but in actuality the comparison, adding, and removal operations are performed at a binary representation level at the data storage and processor level. Applicant’s specification does not indicate any further complexity regarding the dictionary structure and its operations of comparing, adding, and removing entries such that it would prevent a dictionary structure indicated in Schreter to be used, and therefore, the existing §103 rejections are still maintained.
Regarding applicant’s Remarks on p.15:
“Moreover, the Soni reference fails to suggest that a training data set is created using a dictionary such that the data set that includes one vector whose binary representation corresponds to each of a plurality of elements (which in turn relate to a plurality of binary representations) in the dictionary structure. Simply stating that parameters associated with a dimensionality reduced label space are trained using training data fails to suggest the specifically recited technique for creating the training data set.”
Examiner has fully considered this argument, and has found the applicant’s arguments to be not persuasive. Applicant's above arguments are directed to the new claim limitations introduced in the amended Claims 1 and 12, which requires further analysis and re-examination of the amended and related original claims, which will be provided in the context of the updated claim mappings in the relevant sections indicated below. 

Claim Objections
Claims 1 and 12 are objected to because of the following informality: Both claims contain the limitation “creating a training set for use in training a machine learning model, the training set comprising a plurality of vectors that each have a a different one in the dictionary structure, …”. The phrase “a different one a plurality of elements” should be cleaned up to indicate that each binary representation corresponds to a different (i.e., unique binary representation) element in the dictionary structure. Appropriate correction is required. 

Claim Rejections - 35 USC § 112
The following is a quotation of 35 U.S.C. 112(b):
(b)  CONCLUSION.—The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the inventor or a joint inventor regards as the invention.


The following is a quotation of 35 U.S.C. 112 (pre-AIA ), second paragraph:
The specification shall conclude with one or more claims particularly pointing out and distinctly claiming the subject matter which the applicant regards as his invention.


Claims 21 and 22 are rejected under 35 U.S.C. 112(b) or 35 U.S.C. 112 (pre-AIA ), second paragraph, as being indefinite for failing to particularly point out and distinctly claim 
Regarding new Claims 21 and 22, 
Claims 21 and 22 recite the limitation "training the machine learning model using the training data set" in line 2.  There is insufficient antecedent basis for this limitation in the claim, since there is no earlier reference to “a training data set” in their respective independent Claims 1 or 12. For the purposes of examination, this claim limitation will be interpreted as “training the machine learning model using the training ”.

Claim Rejections - 35 USC § 101









35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.


When considering subject matter eligibility under 35 U.S.C. 101, it must be determined whether the claim is directed to one of the four statutory categories of invention, i.e., process, machine, manufacture, or composition of matter (Step 1). If the claim does fall within one of the statutory categories, the second step in the analysis is to determine whether the claim is directed to a judicial exception (Step 2A). The Step 2A analysis is broken into two prongs. In the first prong (Step 2A, Prong 1), it is determined whether or not the claims recite a judicial exception (e.g., mathematical concepts, mental processes, certain methods of organizing human activity). If it is determined in Step 2A, Prong 1 that the claims recite a judicial exception, the analysis proceeds to the second prong (Step 2A, Prong 2), where it is determined whether or not the claims integrate the judicial exception into a practical application. If it is determined at step 2A, Prong 2 that the claims do not integrate the judicial exception into a practical application, the analysis proceeds to determining whether the claim is a patent-eligible 
Claims 1-8 and 10-19 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more than the abstract idea itself, and hence is not patent-eligible subject matter. 
Regarding amended Claim 1, 
Step 1: The claim recites a (computer-implemented) method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: This claim recites the following mental processes:
extracting features from each of the samples to result in a corresponding feature set, at least a portion of the extracted features being associated with malicious code (Under its broadest reasonable interpretation, this claim element recites a mental process, as this feature extraction can be done by parsing and grouping/identifying elements (such as the binary format of a file) from an executable or document file (which can either have benign or malicious code) and compiling these grouped elements into a feature set, where the feature set extracted from a file containing malicious code will have elements (i.e., extracted features) associated with malicious code. The identification of these grouped elements in a file represents a decision-making process, as parsing, grouping, and identifying these elements involves observations, judgments, evaluations, and opinions, which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).);
 vectorizing, for each of the[[a]] plurality of samples, the[[a]] corresponding feature set extracted from a sample, the[[a]] vectorizing resulting in a sparse vector (Under its broadest reasonable interpretation, this claim element recites a mental process, as this vectorizing step is ;
generating, for each sparse vector, a reduced dimension vector representing the sparse vector (Under its broadest reasonable interpretation, this claim element recites a mental process, as this generating step is a form of eliminating entries from the vector and storing it in another format (observations, judgments, evaluations, opinions), which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).);
creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation (Under its broadest reasonable interpretation, this claim element recites a mental process, as this creating step is a form of mapping existing data and storing it in another format (observations, judgments, evaluations, opinions), which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).); …
adding each of the binary representation vector as a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure (Under its broadest reasonable interpretation, this claim element recites a mental process, as this adding step is a form of organizing information and manipulating information through mathematical correlations, which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).) …
Step 2A Prong 2: This claim further recites:
receiving a plurality of samples at a processing node that each comprise one or more files, the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code (Even though a machine learning model is recited in the receiving a plurality of samples at a processing node that each comprise one or more files”) is a form of data gathering on a general purpose computer, and thus is directed to pre-solution activity in a claimed process, with an intended use/field of use being identified as “the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code”. The data gathering aspect of this claim limitation does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). The field of use/general linking to a technological environment aspect of this claim limitation also does not add a meaningful limitation to the claim, and hence also does not integrate the judicial exception into a practical application. See MPEP 2106.05(h).); …
creating a training set for use in training a machine learning model, the training set comprising a plurality of vectors that each have a that corresponds to a different one , the trained machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples (Even though a machine learning model is recited in the context of this limitation, it is not recited in the affirmative case within the context of the claim. In other words, the machine learning model is not recited as being actively trained with the input samples, and thus is treated as an intended use of the claimed invention or a field of use limitation. Furthermore, the training of a machine learning model is modified by a contingent clause “… when trained using the training set …” in a method claim, thus not making it a requirement for it to be present in order for the invention to be practiced. Hence, this creating a training set … the training set comprising a plurality of vectors that each have a binary representation…”) is considered a form of storing information on a general purpose computer (associated with a mental process that ensures the dictionary data structures contain unique entries: “… that corresponds to a different one a plurality of elements in the dictionary structure …”), and thus is directed to post-solution activity for use in a claimed process, with an intended use/field of use being identified as: “… for use in training a machine learning model, …”, followed by additional description to further describe the intended use: “… the trained machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples”. The storing information aspect of this claim limitation does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application. See MPEP 2106.05(g). The field of use/general linking to a technological environment aspect of this claim limitation also does not add a meaningful limitation to the claim, and hence also does not integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
receiving a plurality of samples at a processing node that each comprise one or more files, the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code (The data gathering aspect of this limitation is directed to receiving or transmitting data over a network, which is recognized as a well-understood, routine, conventional activity, and does not add significantly more than the judicial exception, alone or in combination with other elements. See MPEP 2106.05(d)(II), list 1, example i. As analyzed in Step 2A Prong 2, a general linking to a technological environment does not add a meaningful limitation to the claim. See MPEP 2106.05(h). Hence, this aspect does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.); …
creating a training set for use in training a machine learning model, the training set comprising a plurality of vectors that each have a that corresponds to a different one , the trained machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples (This claim element is directed to storing and retrieving information in memory on a general purpose computer, which is recognized as a well-understood, routine, conventional activity, and does not add significantly more than the judicial exception, alone or in combination with other elements. See MPEP 2106.05(d)(II), list 1, example iv. As analyzed in Step 2A Prong 2, a general linking to a technological environment does not add a meaningful limitation to the claim. See MPEP 2106.05(h). Hence, this aspect does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding previously presented Claim 2, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 2 is a dependent claim of Claim 1, and hence inherits the same mental processes mentioned above. 
Step 2A Prong 2: This claim further recites:
wherein the sample is derived from a file that has a portable executable format, a document format, a file format, an executable format, a script format, an image format, a video format, an audio format, or any combination thereof (This claim element places an additional limitation by defining the different types of sample data, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
wherein the sample is derived from a file that has a portable executable format, a document format, a file format, an executable format, a script format, an image format, a video format, an audio format, or any combination thereof (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding original Claim 3, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 3 is a dependent claim of Claim 1, and hence inherits the same mental processes mentioned above. This claim further recites the following mental processes:
wherein the generating comprises: randomly projecting the sparse vector into a key space (This claim element recites the same generating step from Claim 1, which is understood to be a recitation of a mental process. Random projection is a mathematical technique that takes an input vector and reduces the dimensionality of the vector, which is a form of eliminating entries from a vector and storing it in another format. As discussed in Claim 1, the generating step of eliminating entries from a vector and storing it in another format is a recitation of a mental process (observations, judgments, evaluations, opinions), as it can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(I-A) and 2106.04(a)(2)(III).), …
Step 2A Prong 2: This claim further recites:
wherein the reduced dimension vector comprises a randomly projected vector21 (This claim element places an additional limitation by defining the reduced dimension vector as a randomly projected vector, as well as generally linking the method to a technological environment. One form of dimensional reduction involves performing random projections, which .
Step 2B: This claim further recites:
wherein the reduced dimension vector comprises a randomly projected vector21 (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding amended Claim 4, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 4 is a dependent claim of Claim 3, and hence inherits the same mental processes mentioned above. The claim further recites the following mental process:
wherein the randomly projected vector is generated by applying a random projection to the sparse vector (This claim element recites the same limitation as in Claim 3, with a different re-phrasing of the claim language, defining the result of a random projection as a randomly projected vector. As indicated in Claim 3 and in corresponding independent Claim 1, the generating step is understood to be a recitation of a mental process. Random projection is a mathematical technique that takes an input vector and reduces the dimensionality of the vector, which is a form of eliminating entries from a vector and storing it in another format. As discussed in Claim 1, the generating step of eliminating entries from a vector and storing it in another ; …
Step 2A Prong 2: This claim further recites:
wherein the random projection preserves all s between  (This claim element merely recites the intended result of a random projection (which is to preserve pairwise distances between features), and hence amounts to insignificant extra-solution activity for use in a claimed process. This additional element does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application. See MPEP 2106.5(g).).
Step 2B: This claim further recites:
wherein the random projection preserves all s between  (This claim element is a well-known, understood, routine, and conventional activity of random projection [Arriga et al., An algorithmic theory of learning: robust concepts and random projection, March 28 2006; p.162 cites an observation resulting from a random projection, where “random projection (approximately) preserves key properties of a set of points, e.g., the distances between pairs of points (Johnson & Lindenstrauss, 1984); this has led to efficient algorithms in several other contexts (Kleinberg, 1997; Linial, et al., 1994; Vempala, 2004).”; Arriga p.165 further cites the observation from the same Johnson & Lindenstrauss reference that indicates that pairwise distances are preserved (within a reasonable factor): “It has been shown that if R is a random orthonormal matrix, i.e., the columns of R are random unit vectors and they are pairwise orthogonal, then the projection preserves all pairwise distances to within a factor of (1 + ϵ) for a surprisingly small value of k of about log n/ϵ2 (Johnson & Lindenstrauss, 1984)”; Arriga pp.181-182 lists references, several of which indicate analysis of Johnson & Lindenstrauss’ results, further indicating that this observation is well-known, understood, routine, and conventional], which does not add  or in combination with other elements in the claim. See MPEP 2106.05(d); Berkheimer v. HP, Inc., 881 F.3d 1360, 1368, 125 USPQ2d 1649, 1654 (Fed. Cir. 2018).).
Regarding original Claim 5, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 5 is a dependent claim of Claim 4, and hence inherits the same mental processes mentioned above.  
Step 2A Prong 2: This claim further recites:
wherein the random projection has a predetermined size (This claim element places an additional limitation by indicating that the random projection has a predetermined size. A dimension reduction using random projection inherently has a predetermined size when the d-dimensional vector is reduced to a lower k-dimensional vector of predetermined size k, and hence this claim element is considered as further defining the type of vector, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
wherein the random projection has a predetermined size. (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding original Claim 6, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 6 is a dependent claim of Claim 1, and hence inherits the same mental processes mentioned above.
Step 2A Prong 2: This claim further recites:
wherein each value in the plurality of values in the randomly projected vector corresponds to at least one of the following: a positive value, a negative value, and a zero value (This claim limitation places an additional limitation by indicating that a randomly projected vector has positive or negative values, or a zero value, which are interpreted as types of numerical values, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
wherein each value in the plurality of values in the randomly projected vector corresponds to at least one of the following: a positive value, a negative value, and a zero value (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding original Claim 7, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 7 is a dependent claim of Claim 6, and hence inherits the same mental processes mentioned above.
Step 2A Prong 2: This claim further recites:
wherein each binary representation is generated by mapping the predetermined value to at least one of the following: 1 and 0 (This claim limitation places an additional limitation by defining a binary representation vector as a vector that contains numerical values of either 1 or ; and
wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0 (This claim limitation places an additional limitation by further defining a mapping of predetermined numerical values to binary values, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
wherein each binary representation is generated by mapping the predetermined value to at least one of the following: 1 and 0 (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.); and
wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0 (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding previously presented Claim 8, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 8 is a dependent claim of Claim 1, and hence inherits the same mental processes mentioned above. The claim further recites the following additional mental processes:
wherein the adding further comprises:
comparing the binary representation to the plurality of existing binary representations in the dictionary structure (This claim element further limits the adding step from Claim 1, which is understood to be a recitation of a mental process. Under its broadest reasonable interpretation, this claim element further recites a mental process, as this comparing step is a form of observation, evaluation, judgment, and opinion, which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).);
22NAI-1503258673v1Docket No. 14216-075-999determining, based on the comparing, another binary representation in the plurality of binary representations being a duplicate of the binary representation (This claim element further limits the adding step from Claim 1, which is understood to be a recitation of a mental process. Under its broadest reasonable interpretation, this claim element further recites a mental process, as this determining step is a form of observation, evaluation, judgment, and opinion, which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).); and
selecting the determined other binary representation for creating the training set (This claim element further limits the adding step from Claim 1, which is understood to be a recitation of a mental process. Under its broadest reasonable interpretation, this claim element further recites a mental process, as this selecting step is a form of observation, evaluation, judgment, and opinion, which can be practically implementable in the human mind, with aid of pen and paper. See MPEP 2106.04(a)(2)(III).).
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding previously presented Claim 10, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 10 is a dependent claim of Claim 8, and hence inherits the same mental processes mentioned above. The claim further recites the following additional mental process:
performing, upon determination that the determined other binary representation is a duplicate of the binary representation, at least one of replacing the determined other binary representation with the binary representation, and discarding one of the binary representation and the determined other binary representation. (Under its broadest reasonable interpretation, this claim element further recites a mental process, as this performing step (consisting of replacing and/or discarding a binary representation) is a form of observation, evaluation, judgment, and opinion, which can be practically implementable in the human mind. See MPEP 2106.04(a)(2)(III).)
Step 2A Prong 2: This claim does not recite any additional elements to be further analyzed at this step.
Step 2B: This claim does not recite any additional elements to be further analyzed at this step.
Regarding original Claim 11, 
Step 1: The claim recites a method, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 11 is a dependent claim of Claim 1, and hence inherits the same mental processes mentioned above. The claim further recites the following additional mental process:
wherein at least one of 
the vectorizing [is performed by at least one processor of at least one computing system] (This claim element further limits the vectorizing step from Claim 1, which is understood to be a recitation of a mental process, by including at least one processor of at least one computing system to perform the vectorizing step. Under its broadest reasonable interpretation, this claim element further recites a mental process (observations, judgments, evaluations, opinions). A claim limitation that merely use computers as a tool to perform a mental process are still considered as mental processes that are practically implementable in the human mind. See MPEP 2106.04(a)(2)(III-C).) …
the generating [is performed by at least one processor of at least one computing system] (This claim element further limits the generating step from Claim 1, which is understood to be a recitation of a mental process, by including at least one processor of at least one computing system to perform the generating step. Under its broadest reasonable interpretation, this claim element further recites a mental process (observations, judgments, evaluations, opinions). A claim limitation that merely use computers as a tool to perform a mental process are still considered as mental processes that are practically implementable in the human mind. See MPEP 2106.04(a)(2)(III-C).) …
the creating the binary representation [is performed by at least one processor of at least one computing system] (This claim element further limits the creating the binary representation step from Claim 1, which is understood to be a recitation of a mental process, by including at least one processor of at least one computing system to perform the creating of the binary representation. Under its broadest reasonable interpretation, this claim element further recites a mental process (observations, judgments, evaluations, opinions). A claim limitation that merely use computers as a tool to perform a mental process are still considered as mental processes that are practically implementable in the human mind. See MPEP 2106.04(a)(2)(III-C).)
the adding [is performed by at least one processor of at least one computing system] (This claim element further limits the adding step from Claim 1, which is understood to be a recitation of a mental process, by including at least one processor of at least one computing system to perform the adding step. Under its broadest reasonable interpretation, this claim element further recites a mental process (observations, judgments, evaluations, opinions). A claim limitation that merely use computers as a tool to perform a mental process are still considered as mental processes that are practically implementable in the human mind. See MPEP 2106.04(a)(2)(III-C).) …
Step 2A Prong 2: This claim further recites:
the creating the training set is performed by at least one processor of at least one computing system (This claim element further limits the creating the training set step from Claim 1, which is understood to be a recitation of a form of storing information on a general purpose computer, by including at least one processor of at least one computing system to perform the creating of the training set. Under its broadest reasonable interpretation, this claim element further recites a form of storing information on a general purpose computer, and thus is directed to post-solution activity for use in a claimed process. This additional element of storing information does not add a meaningful limitation to the claim, and hence does not integrate the judicial exception into a practical application.), and
wherein the computing system comprises at least one of the following: a software component, a hardware component, and any combination thereof (This claim element places an additional limitation by defining the components of a computing system. A computing system inherently includes a software component, a hardware component, and any combination thereof, and hence this claim element is considered as defining the type of computing system, as well as generally linking the method to a technological environment. Type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h).).
Step 2B: This claim further recites:
the creating the training set is performed by at least one processor of at least one computing system (This claim element is directed to storing information in memory on a general purpose computer, which is recognized as a well-understood, routine, conventional activity, and does not add significantly more than the judicial exception, alone or in combination with other elements. See MPEP 2106.05(d)(II), list 1, example iv.), and
wherein the computing system comprises at least one of the following: a software component, a hardware component, and any combination thereof (As analyzed in Step 2 Prong 2, type definitions and a general association to a technological environment do not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.).
Regarding amended Claim 12, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: The body of this claim is the same as cited in amended Claim 1, and hence the same Step 2A Prong 1 analysis from amended Claim 1 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in amended Claim 1, and hence the same Step 2A Prong 2 analysis from amended Claim 1 applies here. 
The additional claim element “computer hardware configured to perform operations” only restricts the claim limitation to a technological environment and does not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). This claim element is also further directed to mere instructions for applying a judicial exception, which also does not further integrate the judicial exception into a practical application. See MPEP 2106.05(f).
Step 2B: The body of this claim is the same as cited in amended Claim 1, and hence the same Step 2B analysis from amended Claim 1 applies here.
“computer hardware configured to perform operations” only restricts the claim limitation to a technological environment and does not further integrate the judicial exception into a practical application. See MPEP 2106.05(h). As analyzed in Step 2A Prong 2, this claim element is also further directed to mere instructions for applying a judicial exception, which also does not further integrate the judicial exception into a practical application. See MPEP 2106.05(f). Hence this claim element does not add significantly more than the judicial exception, alone or in combination with other elements in the claim.
Regarding previously presented Claim 13, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 13 is a dependent claim of Claim 12. Furthermore, the body of this claim is the same as cited in amended Claim 2, and hence the same Step 2A Prong 1 analysis from amended Claim 2 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in amended Claim 2, and hence the same Step 2A Prong 2 analysis from amended Claim 2 applies here.
Step 2B: The body of this claim is the same as cited in amended Claim 2, and hence the same Step 2B analysis from amended Claim 2 applies here.
Regarding original Claim 14, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 14 is a dependent claim of Claim 12. Furthermore, the body of this claim is the same as cited in Claim 3, and hence the same Step 2A Prong 1 analysis from Claim 3 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in Claim 3, and hence the same Step 2A Prong 2 analysis from Claim 3 applies here.
Step 2B: The body of this claim is the same as cited in Claim 3, and hence the same Step 2B analysis from Claim 3 applies here.
Regarding amended Claim 15, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 15 is a dependent claim of Claim 14. Furthermore, the body of this claim is the same as cited in Claim 4, and hence the same Step 2A Prong 1 analysis from Claim 4 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in Claim 4, and hence the same Step 2A Prong 2 analysis from Claim 4 applies here.
Step 2B: The body of this claim is the same as cited in Claim 4, and hence the same Step 2B analysis from Claim 4 applies here.
Regarding original Claim 16, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 16 is a dependent claim of Claim 15. Furthermore, the body of this claim is the same as cited in Claim 5, and hence the same Step 2A Prong 1 analysis from Claim 5 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in Claim 5, and hence the same Step 2A Prong 2 analysis from Claim 5 applies here.
Step 2B: The body of this claim is the same as cited in Claim 5, and hence the same Step 2B analysis from Claim 5 applies here.
Regarding original Claim 17, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 17 is a dependent claim of Claim 12. Furthermore, the body of this claim is the same as cited in Claim 6, and hence the same Step 2A Prong 1 analysis from Claim 6 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in Claim 6, and hence the same Step 2A Prong 2 analysis from Claim 6 applies here.
Step 2B: The body of this claim is the same as cited in Claim 6, and hence the same Step 2B analysis from Claim 6 applies here.
Regarding original Claim 18, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 18 is a dependent claim of Claim 17. Furthermore, the body of this claim is the same as cited in Claim 7, and hence the same Step 2A Prong 1 analysis from Claim 7 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in Claim 7, and hence the same Step 2A Prong 2 analysis from Claim 7 applies here.
Step 2B: The body of this claim is the same as cited in Claim 7, and hence the same Step 2B analysis from Claim 7 applies here.
Regarding previously presented Claim 19, 
Step 1: The claim recites a system, therefore it falls into one of the four statutory categories (i.e., process, machine, article of manufacture, or composition of matter).
Step 2A Prong 1: Claim 19 is a dependent claim of Claim 12. Furthermore, the body of this claim is the same as cited in amended Claim 8, and hence the same Step 2A Prong 1 analysis from amended Claim 8 applies here.
Step 2A Prong 2: The body of this claim is the same as cited in amended Claim 8, and hence the same Step 2A Prong 2 analysis from amended Claim 8 applies here.
Step 2B: The body of this claim is the same as cited in Claim 8, and hence the same Step 2B analysis from Claim 8 applies here.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1-3, 6, 8, 11-14, 17, 19, and 21-22 are rejected under 35 U.S.C. 103 as being unpatentable over Avasarala et al., U.S. PGPUB 2014/0090061, published 3/27/2014 [hereafter referred as Avasarala] in view of Durand et al., Using Randomized Projection Techniques to Aid in Detecting High-Dimensional Malicious Applications, 49th ACM Southeast Conference, March 24-26, 2011 ACM 978-1-4503-0686-07/11/03, pp.166-172 [hereafter referred as Durand], in further view of Schreter, Ivan, U.S. PGPUB 2015/0324480 (filed 5/8/2014) [hereafter referred as Schreter].
Regarding amended Claim 1, Avasarala teaches 
A computer-implemented method for identifying a presence of malicious code in one or more data samples (Avasarala paragraph [0021]: “… a method for improved zero-day malware detection that receives a set of training files which are each known to be either malign or benign, partitions the set of training files into a plurality of categories … identifying features present in the training files in the selected category of training files … ”), the method comprising: 
receiving a plurality of samples at a processing node that each comprise one or more files, the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code (Avasarala Figure 3, elements 300, 312: examiner’s note: Under its broadest reasonable interpretation, the term “plurality of samples” is interpreted as the training files containing either benign or malicious “data samples”, which are interpreted as the identified features extracted from the training files. A malware detection system receives a repository of files known to be malign or benign, where the malware detection system identifies and analyzes features present in the training files to determine whether they are good indicators of malware (corresponding to “receiving a plurality of samples at processing node that each comprise one or more files, the plurality of samples being selected to train a machine-learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code” (Avasarala paragraph [0061]: “ … an embodiment of a system 300 for automated machine-learning, zero-day malware detection that relies on a basic machine-learning system. System 300, which may also be referred to as a malware detector pipeline, includes a training component 310 and an execution component 320. Training component 310 builds classifier from a set of training files 312 (e.g., training repository 102 from FIG.1). Training files 312 include known malware files and known benign files.” and Avasarala paragraphs [0039]-[0040]: “A machine-learning program, application, … (a “machine-learning trainer 104”) may be executed on the repository of files to identify features that are indicative of a file being malware. Such features may be n-grams … Using the n-grams, the machine-learning trainer 104 creates binary feature vector representations of each file in the training repository. The machine-learning trainer 104 evaluates the features of the entire training collection to identify a subset of those that are the most effective at distinguishing between malign and benign files.”).);
extracting features from each of the samples to result in a corresponding feature set, at least a portion of the extracted features being associated with malicious code (Avasarala Figure 3, element 314; Figure 4A, element 404: examiner’s note: The machine-learning system incorporating an extended feature vector generator (EFVG) that can contain one or more feature vector generators FVG to perform feature extraction from benign or malicious training files (Avasarala paragraph [0057]: “… A feature vector generator (FVG) derives features … from an object and populates the feature vector with those features”) and producing an extended feature vector containing a set of feature vectors (corresponding to “extracting features from each of the samples to result in a corresponding feature set, at least a portion of the extracted features being associated with malicious code”) (Avasarala paragraph [0060]: “Embodiments also introduce an “extended feature vector” (EFV) that comprises the features of an object (e.g., a training or target file) that correspond to these different attribute classes. An EFV may be a concatenation of a number of feature vectors corresponding to different types of features (e.g., in embodiments, n-grams, pdf-objects, pe32 objects, etc.). … the EFVG draws upon individual FVGs to generate feature-type-specific feature vectors and then concatenates these feature-type-specific feature vectors into an EFV.”).);
vectorizing, for each of the[[a]] plurality of samples, the[[a]] corresponding feature set extracted from the[[a]] sample, the vectorizing resulting in a sparse vector (Avasarala Figure 3, element 314; Figure 4A, element 404: examiner’s note: An EFV produced by an EFVG Avasarala paragraph [0060]), where each extracted n-gram feature is represented as a sequence of 1s and 0s indicating presence or absence of an n-gram within the training file. Under its broadest reasonable interpretation, a sparse vector is a vector that contains zero values, and hence this n-gram feature vector representing the training file represents a sparse vector, thus corresponding to “vectorizing, for each of the plurality of samples, the corresponding feature set extracted from the sample, the vectoring resulting in a sparse vector” (Avasarala paragraph [0050]: “… the features are n-grams, ordered sequence of entities (grams) of length n and a gram is a byte of binary data. The feature vector is an ordered list of ones and zeros indicating either the presence, or absence, of an n-gram within the file’s binary representation.”).); …
… adding each of the binary representation vectors as a new element in a dictionary structure (Avasarala Figure 4A, element 404: examiner’s note: Under its broadest reasonable interpretation, a binary representation vector is a representation that consists of 0s and 1s, which can be represented as digits or even as a string of characters (where each character is expressible as a set of 0s and 1s stored in memory). In Avasarala reference, a feature vector containing 0s and 1s represents a binary representation vector, and a dictionary structure contains a set of items in a data structure represented by their respective key_identifier-value pairs. An EFVG manages addition and removal of attributes, attribute classes in a machine learning system (Avasarala paragraph [0054]: “An EFVG facilitates and manages the addition or removal of attributes, attribute classes, and corresponding feature derivation methods in a machine-learning system.”), where an attribute class represents categories for the corresponding attribute values (Avasarala paragraph [0059]). A EFVG manages these attribute classes and corresponding values by parsing a feature set description file and creating a data structure representing each attribute class, the attributes, and their associated values as key_identifier-value pairs (corresponding to a “dictionary structure”), where this data structure holds the corresponding attribute values generated by the Avasarala paragraphs [0062]-[0065]: “With reference now to FIGS. 4A and 4B, shown is an embodiment of an improved system 400 for automated machine-learning, zero-day malware detection that incorporates an EFVG. Embodiments of system 400 operate by adding two components to the basic machine-learning system: 1. A supplementary feature set description file 402 that, in no particular order, lists the semantic label or descriptive representation of an attribute and a specified computer represented attribute class to which it belongs …  2. An extensible feature vector generator superclass 404 (for any object-oriented programming language) that provides a method for: a. Parsing the supplementary feature set description file 402 and creating a data structure comprising, for each attribute class, the attributes and their associated values as key-value pairs; …” and Avasarala paragraphs [0071]: “…an EFVG adds a comment to each line of each attribute, denoting the “attribute class, i.e., the type of feature to which the attribute pertains. … This comment field may be used by the EFVG to identify the mechanism to be used to calculate the value (i.e., the feature) corresponding to this attribute. By including this comment, the attribute-relational file can be parsed to create a data structure that holds key-value pairs of attribute classes and sets of attributes comprising that class. Once this data structure is constructed, all feature vectors can be generated consistently.”).) …
creating a training set for use in training a machine learning model, the training set comprising a plurality of vectors that each have a (Avasarala Figure 4A, elements 410, 412, 402, 404, 414, 418, 426: examiner’s note: The training component receiving the training files 412 (containing malign and benign files) and for each training file, uses the extended feature vector generator EFVG to generate identifier-value based data structures based on a feature set description file. This data structure stores the Avasarala paragraph [0066]: “As shown in FIG. 4A, training component 410 receives training files 412 and supplementary feature set description file 402 and uses EFVG 404 to build training EFVs 414. The EFVG 404 includes attribute class-type FVG 416, including a naïve n-gram-specific FVG, … EFVG 404 concatenates the type-specific feature vectors 413 into the training EFVs 414. Trainer 418 builds classifier 426 from training EFVs 414.”).) … , 
the machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples ([Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim language to not be performed because the condition precedent (“when trained using the training set” is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled.] [Avasarala Figure 4B, element 420, 422, 404, 413, 426, 427: examiner’s note: Using the EFVG and the trained classifier from the training phase shown in Figure 4A in an execution component to produce a prediction output label indicating whether a target file provided to the execution component is benign or malicious (corresponding to “the machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples”) (Avasarala paragraph [0066]: “… With reference to FIG. 4B, execution component 420 utilizes EFVG 404 to analyze target file 422. If multiple feature vectors are generated for a target file … EFVG 404 concatenates the type-specific feature vectors 413 and classifier 426 analyzes this concatenated feature and outputs a benign or malicious label 427 for the target file based on this comparison. As described above with reference to FIG. 1, this output may include a calculated percentage likelihood or confidence level that target file is malicious (or benign). The EFVG 404 may be re-used during testing and prediction (classifying).”).]).
While Avasarala teaches selecting relevant features from a training file according to high information-gain (which will indirectly reduce the dimensionality of the examined feature space for each training file, Avasarala paragraph [0058]), Avasarala does not explicitly teach
… generating, for each sparse vector, a reduced dimension vector representing the sparse vector; 
creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation; …
Durand teaches
… generating, for each sparse vector, a reduced dimension vector representing the sparse vector (Examiner’s note: The similarity software program used for detecting malicious applications (Durand p.168 col.1 Section 3. Experiment 1st paragraph) generates an m-dimensional feature vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional feature vector a sparse vector), and applies a randomized projection technique based on Linial-London-Rabinovich (LLR) algorithm to perform a k dimension reduction on the original m-dimensional feature vector where k < “n” (i.e., n=m as it is the m-dimensional vector being reduced), thus corresponding to “generating for each sparse vector, a reduced dimension vector representing the sparse vector” (Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “The software created for this experiment provides functionality to ingest Windows formatted, binary executables and creates an m-dimensional data space containing vectors that represent those applications. In these experiments, m is the number of total possible n-grams that can be extracted from the ingested applications, one dimension for each possible n-gram. … The information stored in each of the dimensions can take on one of several possible values: the binary values of ‘1’ if the application contains the particular n-gram or ‘0’ if it does not. Once the m-dimensional vectors have been created, we can then apply the randomized project technique … via the extended Linial-London-Rabinovich (LLR) algorithm … LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.”).); 
creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation (Examiner’s note: The similarity software program used for detecting malicious applications (Durand p.168 col.1 Section 3. Experiment 1st paragraph) generates an m-dimensional feature vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional feature vector a sparse vector), and applies a randomized projection technique based on Linial-London-Rabinovich (LLR) algorithm to perform a k dimension reduction on the original m-dimensional feature vector where k < “n” (i.e., n=m as  the original contents of these selected k subsets of the vectors are still binary representations (e.g., representations containing 0s and 1s) of their n-grams that were already converted as binary vectors, thus corresponding to “creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation” (Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “The software created for this experiment provides functionality to ingest Windows formatted, binary executables and creates an m-dimensional data space containing vectors that represent those applications. In these experiments, m is the number of total possible n-grams that can be extracted from the ingested applications, one dimension for each possible n-gram. … The information stored in each of the dimensions can take on one of several possible values: the binary values of ‘1’ if the application contains the particular n-gram or ‘0’ if it does not. Once the m-dimensional vectors have been created, we can then apply the randomized project technique … via the extended Linial-London-Rabinovich (LLR) algorithm … LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.”).); …

It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take extended feature vector generator taught in Avasarala and enhance it by performing dimensional reduction techniques using random projections as taught in Durand as a way to reduce the dimensionality of the extended feature vectors being analyzed for malicious code detection. The motivation to combine is taught in Durand, as a way to mitigate the computational impacts associated with data that has high dimensionality, where in general, incorporating dimension reduction and random projections techniques provide significant computation savings without sacrificing accuracy, where these techniques can be implemented on non-specialized computer hardware, thus making the systems that use these techniques more computationally efficient (Durand p.166 col.2 1st paragraph (Section 1 Introduction): Durand p.167 col.2 Section 2.2 Randomized Projections, 2nd paragraph: “Researchers have used randomized projection in several different applications [12, 14, 15] to reduce the dimensionality of high-dimensional data. … The purpose of their work was to show that compared to other more traditional dimensionality reduction techniques, such as principal component analysis or singular value decomposition, randomized projections offered a greater detail of accuracy. The authors were also able to show a significant computation saving by using randomized projections over other feature extraction techniques, such as principal component analysis.”; and Durand p.168 col.1 Section 3 1st paragraph: “… It is very significant that these experiments could be completed on commodity hardware. It shows that large specialized machines are not needed to perform malicious application detection and that this work can be broadly applied across almost any level of architecture that researchers/developers may have and still gain the significantly positive results that were obtained and discussed below. In addition, this software and the methods that it supports can easily take advantage of commodity cluster hardware for substantial gains in performance”).
Avasarala in view of Durand does not explicitly teach
adding … a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure; …
… a plurality of vectors that each have a that corresponds to a different one  plurality of elements in the dictionary structure, …
Schreter teaches
adding … a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure ([Examiner’s note: Under its broadest reasonable interpretation, this claim limitation in a method claim recites a contingent clause that effectively renders the subsequent claim language to not be performed because the condition precedent (“when the corresponding binary representation is not equal to an existing element in the dictionary structure” is not required to be met, and the claimed invention can be practiced without the condition occurring. See MPEP 2111.04(II). Applicant is advised to amend the claim to positively cite the condition as being fulfilled, since no patentable weight is given for the subsequent claim language following a contingent clause that does not require the condition to be fulfilled for practicing the claimed invention. However, for the purposes of examination, this contingent clause will be treated as if the condition were fulfilled.] [Examiner’s note: Under its broadest reasonable interpretation, the “corresponding binary representation” is referencing the type of value in a new element that is being added to the dictionary structure, where it is the value that is compared to other existing values of elements already present in the dictionary structure. As identified earlier, under its broadest reasonable interpretation, a binary representation vector is a representation that consists of 0s and 1s, which can be represented as digits or even as a string of characters (where each  A filter condition value check on the dictionary index is performed on a new element before it is added to the dictionary, where the filter condition value check looks for uniqueness in the dictionary index value before the element is added to the dictionary. A person having ordinary skill in the art would understand that performing a filter condition value check on a dictionary index does not restrict the actual value itself to a particular type of value, since filter condition checks can be performed on all types of values (in the case of Schreter, it is performed on a hashed value of the actual value, which under its broadest reasonable interpretation, can also be thought of as a binary representation stored in memory, thus corresponding to “adding … a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure”) (Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values … A secondary structure, or dictionary index, may be used to check for duplicates. The dictionary index may be, for example, a hash map or tree-based map from value to value ID. Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).]); …
… a plurality of vectors that each have a that corresponds to a different one  plurality of elements in the dictionary structure (Examiner’s note: Under its broadest reasonable interpretation, a “binary representation” is referencing the type of value in an existing element in the dictionary structure, where it is the value that is compared to other existing values of elements already present in the dictionary  A filter condition value check on the “dictionary index” is performed on a new element before it is added to the dictionary, where the filter condition value check looks for uniqueness in the dictionary index value before the element is added to the dictionary. A person having ordinary skill in the art would understand that performing a filter condition value check on a dictionary index does not restrict the actual value itself to a particular type of value, since filter condition checks can be performed on all types of values (in the case of Schreter, it is performed on a hashed value of the actual value, which under its broadest reasonable interpretation, which can also consist of text characters 0s and 1s, thus producing a binary representation), and it is this filter condition value check on the dictionary index that ensures that no duplicate elements are added into the dictionary (thereby ensuring that each element in the dictionary data structure has a different value, thus corresponding to “ … a plurality of vectors that each have a binary representation that corresponds to a different one a plurality of elements in the dictionary structure”) (Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values…Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).) …
Avasarala in view of Durand and Schreter are analogous art as both are in the field of computer science and both utilize a dictionary data structure to store data representation Avasarala in view of Durand using a dictionary data structure to store elements containing values of 0s and 1s to produce a binary representation, and with Schreter using a dictionary data structure to store elements with text string values, which can contain a string of characters 0s and 1s to produce a binary representation). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the dictionary structure adding algorithm involving binary representation elements taught in Avasarala in view of Durand and enhance it with the condition check for identifying duplicate elements in a dictionary structure taught in Schreter as a way to determine and store unique data representation elements (either in binary representation or text representation). The motivation to combine is taught in Schreter, as using dictionary structures for storage optimizes database storage (and hence reduces computer memory storage), and maintains sort order of a dictionary, which can improve data search performance (Schreter paragraph [0001]: “Database tables include several values for each database record. Storage of these values typically consumes large amounts of memory (e.g., disk-based and/or Random Access Memory). The memory required to store the values may be reduced by storing smaller value IDs instead of the values themselves. In order to facilitate such storage, a dictionary is used which maps values into value IDs. Each unique value in the dictionary is associated with one unique value ID. Therefore, when a particular value is to be stored in a database record, the value ID for the value is determined from the dictionary and the value ID is stored in the record instead. ” and Schreter paragraph [0043]: “The insertion position in the dictionary index depends upon the value...The value is to be inserted at a position which maintains the sort order of the dictionary index.”).
Regarding previously presented Claim 2, Avasarala in view of Durand, in further view of Schreter teaches 
(Previously Presented) The method according to claim 1, 
wherein the sample is derived from a file that has a portable executable format, a document format, a file format, an executable format, a script format, an image format, a video format, an audio format, or any combination thereof (Avasarala Figure 1, element 102; Figure 3, element 312: examiner’s note: Training files in Figure 3 (with each file corresponding to the “sample is derived from a file”) represent the training repository in Figure 1, which contain a variety of file types including executable (.exe) and document (.doc), thus corresponding to “wherein the sample is derived from a file that has … a document format, … an executable format, …” (Avasarala paragraph [0064]: “… Training component 310 builds classifier from a set of training files 312 (e.g., training repository 102 from FIG.1). Training files 312 include known malware files and known benign files.” and Avasarala paragraph [0039]: “The embodiment includes a repository of files 102 known to be malign (malware) and benign (e.g., a “training repository'). Such a repository 102 may include a variety of file types, e.g., .pdf, .exe, .doc, etc.”).).
Regarding original Claim 3, Avasarala in view of Durand, in further view of Schreter teaches 
The method according to claim 1, wherein the generating comprises: randomly projecting the sparse vector into a key space (Examiner’s note: Under its broadest reasonable interpretation, “key space” represents the dimensional space; in the context of an m-dimensional feature vector the key space is the m-dimensional feature space itself. The similarity software program generates an m-dimensional feature vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional feature vector a sparse vector), and applies a randomized projection technique based on Linial-London-Rabinovich (LLR) algorithm to perform a dimension reduction on the original m-dimensional feature vector. The LLR randomized projection algorithm randomly selects k subsets of the original dataset based on a distance calculation between each of the original m-dimensional vectors, where the random selection of Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “The software created for this experiment provides functionality to ingest Windows formatted, binary executables and creates an m-dimensional data space containing vectors that represent those applications. In these experiments, m is the number of total possible n-grams that can be extracted from the ingested applications, one dimension for each possible n-gram. … The information stored in each of the dimensions can take on one of several possible values: the binary values of ‘1’ if the application contains the particular n-gram or ‘0’ if it does not. Once the m-dimensional vectors have been created, we can then apply the randomized project technique via matrix multiplication … via the extended Linial-London-Rabinovich (LLR) algorithm … LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.”).), and 
wherein the reduced dimension vector comprises a randomly projected vector (Examiner’s note: The similarity software program generates an m-dimensional feature vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional feature vector a sparse vector), and applies a randomized projection technique based on Linial-London-Rabinovich (LLR) algorithm to perform a dimension reduction on the original m-dimensional feature vector. The LLR randomized projection algorithm randomly selects k subsets of the original dataset based on a distance calculation between each of the original m-dimensional vectors. Since the Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “The software created for this experiment provides functionality to ingest Windows formatted, binary executables and creates an m-dimensional data space containing vectors that represent those applications. In these experiments, m is the number of total possible n-grams that can be extracted from the ingested applications, one dimension for each possible n-gram. … The information stored in each of the dimensions can take on one of several possible values: the binary values of ‘1’ if the application contains the particular n-gram or ‘0’ if it does not. Once the m-dimensional vectors have been created, we can then apply the randomized project technique via matrix multiplication … via the extended Linial-London-Rabinovich (LLR) algorithm … LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.”).).
Regarding original Claim 6, Avasarala in view of Durand, in further view of Schreter teaches 
(Original) The method according to claim 1, wherein each value in the plurality of values in the randomly projected vector corresponds to at least one of the following: a positive value, a negative value, and a zero value (Examiner’s note: The similarity software program generates an m-dimensional vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional vector a sparse vector), and applies a different randomized projection technique based on matrix multiplication with a randomized matrix, where the random matrix has values of 0,+1, -1, which when this random matrix is multiplied in the matrix multiplication (along with the original sparse vector), will produce a final matrix (representing the randomly projected vector) with positive, negative and zero values, thus corresponding to “wherein each value in the plurality of values in the randomly projected vector corresponds to at least one of … a positive value, a negative value, and a zero value” (Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “Once the m-dimensional vectors have been created, we can then apply the randomized project technique via matrix multiplication with a randomized matrix … In the method of randomized projection via matrix multiplication, the original d-dimensional data is projected to a k-dimensional (k << d) subspace through the origin, using a random d × k matrix R whose columns have unit lengths [14]. The random matrix used for the dimensionality reduction can be populated … by selecting vectors that take on the values of 0, +1 or -1 following a probability distribution of 2/3, 1/6 and 1/6 respectively [14]. In matrix notation, where                         
                            
                                
                                    A
                                
                                
                                    N
                                    x
                                    d
                                
                            
                        
                     is the original set of N d-dimensional observations,                         
                            
                                
                                    A
                                
                                
                                    N
                                    x
                                    k
                                
                            
                        
                     =                         
                            
                                
                                    A
                                
                                
                                    N
                                    x
                                    d
                                
                            
                            
                                
                                    R
                                
                                
                                    d
                                    x
                                    k
                                
                            
                        
                     is the projection of the data onto a lower k-dimensional subspace [14]. The result is a low-dimensional embedding of the original high-dimensional features.”).).  
Regarding previously presented Claim 8, Avasarala in view of Durand, in further view of Schreter teaches 
 (Previously Presented) The method according to claim 1, wherein the adding further comprises: 
comparing the binary representation to the plurality of existing binary representations in the dictionary structure (Examiner’s note: Under its broadest reasonable interpretation, a “binary representation” is referencing the type of value in an existing element in the dictionary structure, where it is the value that is compared to other existing values of elements already present in the dictionary structure. As identified earlier, under its broadest reasonable interpretation, a binary representation vector is a representation that consists of 0s and 1s, which can be represented as digits or even as a string of characters (where each character is expressible as a set of 0s and 1s stored in memory). In Schreter, a dictionary structure is interpreted to contain at least two fields for each element: the value of the new element (the dictionary vector) and a corresponding index value (stored in a separate “dictionary index”, described as based on an actual value, represented as either a hash map or tree-based map). A filter condition value check on the dictionary index is performed on a new element before it is added to the dictionary, where the filter condition value check looks for uniqueness in the dictionary index value before the element is added to the dictionary. A person having ordinary skill in the art would understand that the comparison and determination involves elements associated with a value, where the value is not restricted to a particular type, and thus performing a filter condition value check on a dictionary index does not restrict the actual value itself to a particular type of value, since filter condition checks can be performed on all types of values (in the case of Schreter, it is performed on a hashed value of the actual value, which under its broadest reasonable interpretation, which can also consist of text characters 0s and 1s, thus producing a binary representation). This filter condition value check on the dictionary index ensures that no duplicate elements are added into the dictionary (with the filter condition value check corresponding to “comparing the binary representation to the plurality of existing binary representations in the dictionary structure”) (Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values…Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).);
determining, based on the comparing, another binary representation in the plurality of binary representations being a duplicate of the binary representation (Examiner’s note: The modifier “another” in the term “another binary representation” is used to merely distinguish an existing element (with a value type of a binary representation) in the dictionary structure that is a duplicate of the current element that was compared and was a candidate to be added (also with a value type of a binary representation). This filter condition value check on the dictionary index ensures that no duplicate elements are added into the dictionary (with the filter condition value check producing two exclusive results, i.e., one result in which a duplicate element is found (and where the dictionary index is returned, without adding the element), and another result in which a duplicate element is not found (and the new element is inserted into the dictionary), with the former result of finding a duplicate element corresponding to “determining, based on the comparing, another binary representation in the plurality of binary representations being a duplicate of the binary representation”) (Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values…Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).); and
selecting the determined other binary representation for creating the training set (Examiner’s note: The modifier “other” in the term “other binary representation” is used to merely distinguish an existing element (with a value type of a binary representation) in the dictionary structure that is a duplicate of the current element that was compared and was a candidate to be added (also with a value type of a binary representation). This filter condition Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values…Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).).  
Regarding original Claim 11, Avasarala in view of Durand, in further view of Schreter teaches 
The method according to claim 1, 
wherein at least one of the vectorizing, the generating, the creating the binary representation, the adding, and the creating the training set is performed by at least one processor of at least one computing system (Avasarala Figure 9, element 936; paragraphs [0113]-[0114]: “ … a block diagram of exemplary hardware that may be used to implement embodiments of an improved system and method for automated machine-learning, zero-day malware detection … hardware shown in FIG. 9 may store and execute programs, applications and routines and perform methods described herein. … System 900 may include a one or more servers 930 connected with a network 920 such as the Internet. … Server 930 typically includes a memory 932, a secondary storage 934, one or more processors 936, an input device 938, and a network connection 940. … Processor(s) 936 executes the application(s), which are stored in memory or secondary storage, or received from the Internet or other network, and the processing may be implemented in software, such as Software modules, for execution by computers or other machines.”).), and 
wherein the computing system comprises at least one of the following: 
a software component, 
a hardware component, and 
any combination thereof (Avasarala Figure 9, element 936; paragraphs [0113]-[0114]: “ … a block diagram of exemplary hardware that may be used to implement embodiments of an improved system and method for automated machine-learning, zero-day malware detection … hardware shown in FIG. 9 may store and execute programs, applications and routines and perform methods described herein. … System 900 may include a one or more servers 930 connected with a network 920 such as the Internet. … Server 930 typically includes a memory 932, a secondary storage 934, one or more processors 936, an input device 938, and a network connection 940. … Processor(s) 936 executes the application(s), which are stored in memory or secondary storage, or received from the Internet or other network, and the processing may be implemented in software, such as Software modules, for execution by computers or other machines.”).).
Regarding amended Claim 12, Avasarala teaches
(Currently Amended) A system comprising computer hardware configured to perform operations for identifying a presence of malicious code in one or more data samples (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) comprising: 
receiving a plurality of samples at a processing node that each comprise one or more files, the plurality of samples being selected to train a machine learning model to identify a presence of malicious code in one or more data samples, at least a portion of the plurality of samples comprising malicious code (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
extracting features from each of the samples to result in a corresponding feature set, at least a portion of the extracted features being associated with malicious code (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
vectorizing, for each of the[[a]] plurality of samples, the[[a]] corresponding feature set extracted from the sample, the vectorizing resulting in a sparse vector (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); …
… adding each of the binary representation vectors as a new element in a dictionary structure (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …
creating a training set for use in training a machine learning model, the training set comprising a plurality of vectors that each have a (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) … ,
the machine learning model, when trained using the training set, being configured to identify the presence of malicious code in one or more data samples (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.).
While Avasarala teaches selecting relevant features from a training file according to high information-gain (which will indirectly reduce the dimensionality of the examined feature space for each training file, Avasarala paragraph [0058]), Avasarala does not explicitly teach
… generating, for each sparse vector, a reduced dimension vector representing the sparse vector; 
creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation; …
Durand teaches
… generating, for each sparse vector, a reduced dimension vector representing the sparse vector (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); 
creating, for each reduced dimension vector, a binary representation vector of the reduced dimension vector, the creating comprising converting each value of a plurality of values in the reduced dimension vector to a binary representation (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); …
Both Avasarala and Durand are analogous art since they both teach malicious feature detection via feature extraction (using n-gram analysis) from executable files to produce a set of m-dimensional feature vectors representing those files.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take extended feature vector generator taught in Avasarala and enhance it by performing dimensional reduction techniques using random projections as taught in Durand as a way to reduce the dimensionality of the extended feature vectors being analyzed for malicious code detection. The motivation to combine is taught in Durand, as a way to mitigate the computational impacts associated with data that has high dimensionality, where in general, incorporating dimension reduction and random projections techniques provide significant computation savings without sacrificing accuracy, where these techniques can be implemented on non-specialized computer hardware, thus making the systems that use these techniques more computationally efficient (Durand p.166 col.2 1st paragraph (Section 1 Introduction): Durand p.167 col.2 Section 2.2 Randomized Projections, 2nd paragraph: “Researchers have used randomized projection in several different applications [12, 14, 15] to reduce the dimensionality of high-dimensional data. … The purpose of their work was to show that compared to other more traditional dimensionality reduction techniques, such as principal component analysis or singular value decomposition, randomized projections offered a greater detail of accuracy. The authors were also able to show a significant computation saving by using randomized projections over other feature extraction techniques, such as principal component analysis.”; and Durand p.168 col.1 Section 3 1st paragraph: “… It is very significant that these experiments could be completed on commodity hardware. It shows that large specialized machines are not needed to perform malicious application detection and that this work can be broadly applied across almost any level of architecture that researchers/developers may have and still gain the significantly positive results that were obtained and discussed below. In addition, this software and the methods that it supports can easily take advantage of commodity cluster hardware for substantial gains in performance”).
Avasarala in view of Durand does not explicitly teach
adding … a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure; …
… a plurality of vectors that each have a that corresponds to a different one  plurality of elements in the dictionary structure, …
Schreter teaches
adding … a new element in a dictionary structure when the corresponding binary representation is not equal to an existing element in the dictionary structure (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.); …
… a plurality of vectors that each have a that corresponds to a different one  plurality of elements in the dictionary structure (This claim limitation is similar in scope with a corresponding claim limitation in Claim 1, and hence is rejected under similar rationale.) …
Avasarala in view of Durand and Schreter are analogous art as both are in the field of computer science and both utilize a dictionary data structure to store data representation elements (with Avasarala in view of Durand using a dictionary data structure to store elements containing values of 0s and 1s to produce a binary representation, and with Schreter using a dictionary data structure to store elements with text string values, which can contain a string of characters 0s and 1s to produce a binary representation). 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the dictionary structure adding algorithm involving binary representation elements taught in Avasarala in view of Durand and enhance it with the condition check for identifying duplicate elements in a dictionary structure taught in Schreter as a way to determine and store unique data representation elements (either in binary representation or text representation). The motivation to combine is taught in Schreter, as using dictionary structures for storage optimizes database storage (and hence reduces computer memory storage), and maintains sort order of a dictionary, which can improve data search performance (Schreter paragraph [0001]: “Database tables include several values for each database record. Storage of these values typically consumes large amounts of memory ( e.g., disk-based and/or Random Access Memory). The memory required to store the values may be reduced by storing smaller value IDs instead of the values themselves. In order to facilitate such storage, a dictionary is used which maps values into value IDs. Each unique value in the dictionary is associated with one unique value ID. Therefore, when a particular value is to be stored in a database record, the value ID for the value is determined from the dictionary and the value ID is stored in the record instead.” and Schreter paragraph [0043]: “The insertion position in the dictionary index depends upon the value...The value is to be inserted at a position which maintains the sort order of the dictionary index.”).
Regarding previously presented Claim 13, Avasarala in view of Durand, in further view of Schreter teaches 
(Previously Presented) The system according to claim 12, wherein the sample is derived from a file that has a portable executable format, a document format, a file format, an executable format, a script format, an image format, a video format, an audio format, or any combination thereof (This claim limitation is similar in scope with a corresponding claim limitation in Claim 2, and hence is rejected under similar rationale.).
Regarding original Claim 14, Avasarala in view of Durand, in further view of Schreter teaches 
(Original) The system according to claim 12, wherein the generating comprises: randomly projecting the sparse vector into a key space (This claim limitation is similar in scope with a corresponding claim limitation in Claim 3, and hence is rejected under similar rationale.), and 
wherein the reduced dimension vector comprises a randomly projected vector (This claim limitation is similar in scope with a corresponding claim limitation in Claim 3, and hence is rejected under similar rationale.).
Regarding original Claim 17, Avasarala in view of Durand, in further view of Schreter teaches 
(Original) The system according to claim 12, wherein each value in the plurality of values in the randomly projected vector corresponds to at least one of the following: a positive value, a negative value, and a zero value (This claim limitation is similar in scope with a corresponding claim limitation in Claim 6, and hence is rejected under similar rationale.). 
Regarding previously presented Claim 19, Avasarala in view of Durand, in further view of Schreter teaches
(Previously Presented) The system according to claim 12, wherein the adding further comprises: 
comparing the binary representation to the plurality of existing binary representations in the dictionary structure (This claim limitation is similar in scope with a corresponding claim limitation in Claim 8, and hence is rejected under similar rationale.); 
determining, based on the comparing, another binary representation in the plurality of binary representations being a duplicate of the binary representation (This claim limitation is similar in scope with a corresponding claim limitation in Claim 8, and hence is rejected under similar rationale.); and NAI- 1518943355v17Attorney's Docket No.: 14216-075-999
selecting, based on the determining, the determined other binary representation for creating the training set (This claim limitation is similar in scope with a corresponding claim limitation in Claim 8, and hence is rejected under similar rationale.).
Regarding new Claim 21, Avasarala in view of Durand, in further view of Schreter teaches 
(New) The method of claim 1 further comprising:
training the machine learning model using the training (Avasarala Figure 4A, elements 410, 412, 402, 404, 414, 418, 426: examiner’s note: The training component receives the training files 412 (containing malign and benign files) and for each training file, uses the extended feature vector generator EFVG to generate identifier-value based data structures based on a feature set description file. This data structure stores the feature vectors (concatenated into an extended feature vector EFV) generated by individual feature vector generators FVG. The created extended feature vectors EFV for each file (using a n-gram FVG, thus corresponding to “a plurality of vectors that each have a binary representation”) serve as input into a trainer that builds a classifier (corresponding to “training the machine learning model Avasarala paragraph [0066]: “… training component 410 receives training files 412 and supplementary feature set description file 402 and uses EFVG 404 to build training EFVs 414. The EFVG 404 includes attribute class-type FVG 416, including a naïve n-gram-specific FVG, … EFVG 404 concatenates the type-specific feature vectors 413 into the training EFVs 414. Trainer 418 builds classifier 426 from training EFVs 414.”).); and
deploying the trained machine learning model to identify the presence of malicious code in one or more data samples (Avasarala Figure 4B, element 420, 422, 404, 413, 426, 427: examiner’s note: Using the EFVG and the trained classifier from the training phase shown in Figure 4A in an execution component to produce a prediction output label indicating whether a target file provided to the execution component is benign or malicious (corresponding to “deploying the trained machine learning model to identify the presence of malicious code in one or more data samples”) (Avasarala paragraph [0066]: “… execution component 420 utilizes EFVG 404 to analyze target file 422. If multiple feature vectors are generated for a target file … EFVG 404 concatenates the type-specific feature vectors 413 and classifier 426 analyzes this concatenated feature and outputs a benign or malicious label 427 for the target file based on this comparison. As described above with reference to FIG. 1, this output may include a calculated percentage likelihood or confidence level that target file is malicious (or benign). The EFVG 404 may be re-used during testing and prediction (classifying).”).).
Regarding new Claim 22, Avasarala in view of Durand, in further view of Schreter teaches 
(New) The system of claim 12, wherein the operations further comprise:
training the machine learning model using the training (This claim limitation is similar in scope with a corresponding claim limitation in Claim 21, and hence is rejected under similar rationale.); and
deploying the trained machine learning model to identify the presence of malicious code in one or more data samples (This claim limitation is similar in scope with a corresponding claim limitation in Claim 21, and hence is rejected under similar rationale.).
Claims 4-5 and 15-16 are rejected under 35 U.S.C. 103 as being unpatentable over Avasarala et al., U.S. PGPUB 2014/0090061, published 3/27/2014 [hereafter referred as Avasarala] in view of Durand et al., Using Randomized Projection Techniques to Aid in Detecting High-Dimensional Malicious Applications, 49th ACM Southeast Conference, March 24-26, 2011 ACM 978-1-4503-0686-07/11/03, pp.166-172 [hereafter referred as Durand], in further view of Schreter, Ivan, U.S. PGPUB 2015/0324480 (filed 5/8/2014) [hereafter referred as Schreter] as applied to Claims 3 and 14; in even further view of Achlioptas, Dimitris, Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Computer and System Sciences 66 (2003), Elsevier Science USA 2003, pp.671-687 [hereafter referred as Achlioptas].
Regarding amended Claim 4, Avasarala in view of Durand, in further view of Schreter as applied to Claim 3 teaches 
 (Currently Amended) The method according to claim 3, 
wherein the randomly projected vector is generated by applying a random projection to the sparse vector (Examiner’s note: The similarity software program generates an m-dimensional feature vector consisting of m possible n-grams for each file (where the presence or absence of an n-gram is represented as a 1 or 0 respectively, thus making the m-dimensional feature vector a sparse vector), and applies a randomized projection technique based on Linial-London-Rabinovich (LLR) algorithm to perform a dimension reduction on the original m-dimensional feature vector, where the LLR randomized projection algorithm randomly selects k subsets of the original dataset based on a distance calculation between each of the original m-dimensional vectors, where this distance calculation within the LLR randomized projection algorithm corresponds to “wherein the randomly projected vector is generated by applying a Durand p.168 col.1 Section 3.1 Similarity Software 1st paragraph – p.168 col.2 3rd paragraph: “The software created for this experiment provides functionality to ingest Windows formatted, binary executables and creates an m-dimensional data space containing vectors that represent those applications. In these experiments, m is the number of total possible n-grams that can be extracted from the ingested applications, one dimension for each possible n-gram. … The information stored in each of the dimensions can take on one of several possible values: the binary values of ‘1’ if the application contains the particular n-gram or ‘0’ if it does not. Once the m-dimensional vectors have been created, we can then apply the randomized project technique via matrix multiplication … via the extended Linial-London-Rabinovich (LLR) algorithm … LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.”).) …
While Avasarala in view of Durand, in further view of Schreter teaches that the randomized projection is based on an LLR randomized projection algorithm using a distance calculation, and that the LLR randomized projection algorithm is an extension of the Johnson-Lindenstrauss algorithm, Avasarala in view of Durand, in further view of Schreter does not explicitly teach
… wherein the random projection preserves all s between 
Achlioptas teaches
… wherein the random projection preserves all s between (Achlioptas p.671 Abstract: “A classic result of Johnson and Lindenstrauss asserts that any set of n points in d-dimensional Euclidean space can be embedded into k-dimensional Euclidean space—where k is logarithmic in n and independent of d—so that all pairwise distances are maintained within an arbitrarily small factor.”) and Achlioptas p.672 5th paragraph (Section 1 Introduction): “In a seminal paper, Linial et al. [12] were the first to consider algorithmic applications of embeddings that respect local properties. By now, embeddings of this type have become an important tool in algorithmic design. A real gem in this area has been the following result of Johnson and Lindenstrauss [9]. Lemma 1.1 (Johnson and Lindenstrauss [9]). Given ε > 0 and an integer n, let k be a positive integer such that k≥                         
                            
                                
                                    k
                                
                                
                                    0
                                
                            
                        
                     … For every set P of n points in                         
                            
                                
                                    R
                                
                                
                                    d
                                
                            
                        
                     there exists f:                         
                            
                                
                                    R
                                
                                
                                    d
                                
                            
                        
                    →                         
                            
                                
                                    R
                                
                                
                                    k
                                
                            
                        
                     such that for all u,v∈P (1- ε)                        
                             
                            
                                
                                    ∥
                                    u
                                    -
                                    v
                                    ∥
                                     
                                
                                
                                    2
                                
                            
                        
                     ≤                        
                            
                                
                                    ∥
                                    f
                                    (
                                    u
                                    )
                                    -
                                    f
                                    (
                                    v
                                    )
                                    ∥
                                     
                                
                                
                                    2
                                
                            
                        
                    ≤(1+ ε)                        
                             
                            
                                
                                    ∥
                                    u
                                    -
                                    v
                                    ∥
                                     
                                
                                
                                    2
                                
                            
                        
                    . … By providing a low-dimensional representation of the data, JL-embeddings speed up certain algorithms dramatically, in particular algorithms whose run-time depends exponentially in the dimension of the working space. … At the same time, the provided guarantee regarding pairwise distances often allows one to establish that the solution found by working in the low-dimensional space is a good approximation to the solution in the original space.”).).
Avasarala in view of Durand, in further view of Schreter and Achlioptas are analogous art as both teach random projection algorithms to generate randomly projected vectors. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the LLR random projection algorithm (which is an extension of the Johnson-Lindenstrauss algorithm) taught Avasarala in view of Durand, in further view of Schreter and enhance it with the understanding of the Johnson-Lindenstrauss algorithm taught in Achlioptas to produce a randomly projected vector that preserves all pairwise distances between features in the sparse vector. The motivation to combine is taught in (Achlioptas p.672, 5th-6th paragraphs (Section 1 Introduction): “By providing a low-dimensional representation of the data, JL-embeddings speed up certain algorithms dramatically, in particular algorithms whose run-time depends exponentially in the dimension of the working space. (For a number of practical problems the best-known algorithms indeed have such behavior.) At the same time, the provided guarantee regarding pairwise distances often allows one to establish that the solution found by working in the low-dimensional space is a good approximation to the solution in the original space. … Papadimitriou et al. [13], proved that embedding the points of A in a low-dimensional space can significantly speed up the computation of a low-rank approximation to A, without significantly affecting its quality.”).).
Regarding original Claim 5, Avasarala in view of Durand, in further view of Schreter, in even further view of Achlioptas teaches 
The method according to claim 4, 
wherein the random projection has a predetermined size (Examiner’s note: The similarity software program generates an k-dimensional vector using the Linial-London-Rabinovich (LLR) algorithm to perform a dimension reduction on the original m-dimensional vector. The LLR randomized projection algorithm randomly selects k subsets of the original dataset based on a distance calculation between each of the original m-dimensional vectors, where the k subsets is a predetermined value (in this case, identified as being 500, 1000, or 1500 in the context of the experiment cited in Durand reference, thus corresponding to “wherein the random projection has a predetermined size”) (Durand p.168 col.2 3rd paragraph (Section 3.1 Similarity Software) “… LLR randomized projection or, as we refer to it, random set projection, is based on the LLR algorithm, which is an extension of the Johnson-Lindenstrauss [28] and Bourgain [33] algorithms. It is described as follows: For each cardinality k < n which is a power of 2, randomly pick O(log n) sets A ⊂ V(G) of cardinality k. Map every vertex x to the vector (d(x, A)) (where d(x, A) = min{d(x, y)|y ∈ A}) with one coordinate for each A selected [11]. In short, the algorithm randomly selects k = O(log n) subsets of the original data set, and uses the minimum distances from each vector to each subset as coordinates to create a k-dimensional vector projection.” and Durand p.169 col.1 Section 3.3 Design 1st paragraph: “… For the dimensionality reduction portion, the … LLR random set randomized projection methods described above in section 3.1 were applied to the original high-dimensional data set to produce three separate new low-dimensional embeddings each, with contained 500, 1000 and 1500 features.”).).  
Regarding amended Claim 15, Avasarala in view of Durand, in further view of Schreter as applied to Claim 14 teaches
(Currently Amended) The system according to claim 14, wherein the randomly projected vector is generated by applying a random projection to the sparse vector (This claim limitation is similar in scope with a corresponding claim limitation in Claim 4, and hence is rejected under similar rationale.); …
 While Avasarala in view of Durand, in further view of Schreter teaches that the randomized projection is based on an LLR randomized projection algorithm using a distance calculation, and that the LLR randomized projection algorithm is an extension of the Johnson-Lindenstrauss algorithm, Avasarala in view of Durand, in further view of Schreter does not explicitly teach
… wherein the random projection preserves all s between 
Achlioptas teachesNAI- 1518943355v16Attorney's Docket No.: 14216-075-999
… wherein the random projection preserves all (This claim limitation is similar in scope with a corresponding claim limitation in Claim 4, and hence is rejected under similar rationale.). 
Avasarala in view of Durand, in further view of Schreter and Achlioptas are analogous art as both teach random projection algorithms to generate randomly projected vectors. 
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the LLR random projection algorithm (which is an extension of the Johnson-Lindenstrauss algorithm) taught Avasarala in view of Durand, in further view of Schreter and enhance it with the understanding of the Johnson-Lindenstrauss algorithm taught in Achlioptas to produce a randomly projected vector that preserves all pairwise distances between features in the sparse vector. The motivation to combine is taught in Achlioptas, as this type of random projection provides a good approximation to the solution in the original space without significantly affecting the quality of the solution by speeding up the computation by using a lower dimensional random projection, thus reducing the overall computational complexity and improving performance on the system that uses this algorithm (Achlioptas p.672, 5th-6th paragraphs (Section 1 Introduction): “At the same time, the provided guarantee regarding pairwise distances often allows one to establish that the solution found by working in the low-dimensional space is a good approximation to the solution in the original space. …Papadimitriou et al. [13], proved that embedding the points of A in a low-dimensional space can significantly speed up the computation of a low-rank approximation to A, without significantly affecting its quality.”).).
Regarding original Claim 16, Avasarala in view of Durand, in further view of Schreter, in even further view of Achlioptas teaches
(Original) The system according to claim 15, wherein the random projection has a predetermined size (This claim limitation is similar in scope with a corresponding claim limitation in Claim 5, and hence is rejected under similar rationale.).
Claims 7 and 18 are rejected under 35 U.S.C. 103 as being unpatentable over Avasarala et al., U.S. PGPUB 2014/0090061, published 3/27/2014 [hereafter referred as Avasarala] in view of Durand et al., Using Randomized Projection Techniques to Aid in Detecting High-Dimensional Malicious Applications, 49th ACM Southeast Conference, March 24-26, 2011 ACM 978-1-4503-0686-07/11/03, pp.166-172 [hereafter referred as Durand], in further view of Schreter, Ivan, U.S. PGPUB 2015/0324480 (filed 5/8/2014) [hereafter referred as Schreter] as applied to Claims 6 and 17; in even further view of Burge et al., US PGPUB 2018/0101742 (filed 10/4/2017; provisional filed 10/7/2016) [henceforth referred as Burge].
Regarding original Claim 7, Avasarala in view of Durand, in further view of Schreter as applied to Claim 6 teaches 
(Original) The method according to claim 6, 
wherein each binary representation is generated by mapping the predetermined value to at least one of the following: 1 and 0 (Examiner’s note: The extracted features from a training file are n-grams, where each n-gram can be represented as a feature vector of 0s and 1s indicating the presence or absence of the n-gram (where a presence or absence of an n-gram represents a binary condition corresponding to the “predetermined value”) within the training file (thus corresponding to “wherein each binary representation is generated by mapping the predetermined value to at least one of the following: 1 and 0”) (Avasarala paragraph [0050]: “… the features are n-grams, ordered sequence of entities (grams) of length n and a gram is a byte of binary data. The feature vector is an ordered list of ones and zeros indicating either the presence, or absence, of an n-gram within the file’s binary representation.”).) …
However, Avasarala in view of Durand, in further view of Schreter does not teach
… wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0.  
Burge teaches 
… wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0 (Examiner’s note: Performing a mapping of a sparse feature vector to a binary vector containing 0 and 1, with positive values set to 1 and non-positive values (including negative values and zero) set to 0 (Burge paragraph [0091]: “… binarizing the sparse feature vector includes converting all values in the sparse feature vector above a predefined threshold value to one, and converting all values in the sparse feature vector not above the predefined threshold to zero. Any suitable threshold magnitude value may be used. In some embodiments, the threshold magnitude value may be zero, such that all positive values in the sparse feature vector are set to one in the corresponding binary vector, and all non-positive values (zeros and negative values) in the sparse feature vector may be set to zero in the corresponding binary vector.”).).
Avasarala in view of Durand, in further view of Schreter and Burge are analogous art as both are in the field of computer science and machine learning, and both utilize feature extraction techniques and embedding and encoding techniques for sparse feature vectors.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the positive, negative, and zero results from the vector resulting from the random projection technique based on matrix multiplication taught in Avasarala in view of Durand, in further view of Schreter and enhance it with the encoding algorithm taught in Burge to produce binary representation vectors containing 0 and 1 to replace the positive, negative, and zero values resulting from the random projection. The motivation to combine is indicated in Burge, as encoding of vectors to binary representation are more memory efficient, and comparisons between binary vectors are inherently more run-time efficient at a computer-instruction level, resulting in more-efficient and faster searches when attempting to find closest matches (Burge paragraph [0089]: “Additional benefits to converting sparse feature vectors to binary vectors includes at least the following: (1) a compact binary representation may only require a small amount of space in RAM, increasing the number of templates that may be efficiently searched; (2) binary data may be inherently more efficient for a computer to process meaning comparison speed for binary vectors may be much faster per template then an implementation that relies on floating point; and (3) properties of the binary vector (including that each location can only have one of two possible values) enable finding the closest matches without doing a brute force linear search”).
Regarding Claim 18, Avasarala in view of Durand, in further view of Schreter as applied to Claim 17 teaches
(Original) The system according to claim 17, wherein each binary representation is generated by mapping the predetermined value to at least one of the following: 1 and 0 (This claim limitation is similar in scope with a corresponding claim limitation in Claim 7, and hence is rejected under similar rationale.); …
However, Avasarala in view of Durand, in further view of Schreter does not teach
… wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0.  
Burge teaches 
… wherein the positive value is mapped to 1; the negative value is mapped to 0; and zero value is mapped to 0 (This claim limitation is similar in scope with a corresponding claim limitation in Claim 7, and hence is rejected under similar rationale.).
Avasarala in view of Durand, in further view of Schreter and Burge are analogous art as both are in the field of computer science and machine learning, and both utilize feature extraction techniques and embedding and encoding techniques for sparse feature vectors.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the positive, negative, and zero results from the vector resulting from the random projection technique based on matrix multiplication taught in Avasarala in view of Durand, in further view of Schreter and enhance it with the encoding (Burge paragraph [0089]: “Additional benefits to converting sparse feature vectors to binary vectors includes at least the following: (1) a compact binary representation may only require a small amount of space in RAM, increasing the number of templates that may be efficiently searched; (2) binary data may be inherently more efficient for a computer to process meaning comparison speed for binary vectors may be much faster per template then an implementation that relies on floating point; and (3) properties of the binary vector (including that each location can only have one of two possible values) enable finding the closest matches without doing a brute force linear search”).
Claim 10 is rejected under 35 U.S.C. 103 as being unpatentable over Avasarala et al., U.S. PGPUB 2014/0090061, published 3/27/2014 [hereafter referred as Avasarala] in view of Durand et al., Using Randomized Projection Techniques to Aid in Detecting High-Dimensional Malicious Applications, 49th ACM Southeast Conference, March 24-26, 2011 ACM 978-1-4503-0686-07/11/03, pp.166-172 [hereafter referred as Durand], in further view of Schreter, Ivan, U.S. PGPUB 2015/0324480 (filed 5/8/2014) [hereafter referred as Schreter] as applied to Claim 8; in further view of Aharon et al., K-SVD: An Algorithm for Designing Overcomplete Dictionaries for Sparse Representation, IEEE Transactions on Signal Processing, Vol. 54, No. 11, November 2006 [henceforth referred as Aharon].
Regarding previously presented Claim 10, Avasarala in view of Durand, in further view of Schreter as applied to Claim 8 teaches 
(Previously Presented) The method according to claim 8, further comprising: 
… upon determination that the determined other binary representation is a duplicate of the binary representation (Examiner’s note: This filter condition value check on the dictionary index ensures that no duplicate elements are added into the dictionary (with the filter condition value check producing two exclusive results, i.e., one result in which a duplicate is found (and where the dictionary index is returned, without adding the element), and another result in which a duplicate is not found (and the element is inserted into the dictionary), with the former result of finding a duplicate corresponding to “… upon determination that the determined other binary representation is a duplicate of the binary representation …”) (Schreter paragraphs [0002]-[0003]: “The dictionary can be represented as a vector or radix tree of values…Before adding a new value to the dictionary, it must be ensured that the new value is not already present in the dictionary…the dictionary index is checked for the existence of the value and if found, its value ID is returned. If the value is not found in the dictionary index, the value is inserted into the dictionary vector...”).) …
However, Avasarala in view of Durand, in further view of Schreter does not teach
performing … at least one of replacing the determined other binary representation with the binary representation, and discarding one of the binary representation and the determined other binary representation.  
Aharon teaches
performing … at least one of replacing the determined other binary representation with the binary representation, and discarding one of the binary representation and the determined other binary representation (Examiner’s note: Under its broadest reasonable interpretation, the term “binary representation” is referencing the type of value in an existing element in the dictionary structure, where it is the value that is compared to other existing values of elements already present in the dictionary structure. As identified earlier, under its broadest reasonable interpretation, a binary representation vector is a representation that consists of 0s and 1s, which can be represented as digits or even as a string of characters (where each character is Aharon p.4318, col.2, Section E. K-SVD-Implementation Details, 1st paragraph: “When a dictionary element is not being used “enough” (relative to the number of dictionary elements and to the number of samples), it could be replaced with the least represented signal element, after being normalized (the representation is measured without the dictionary element that is going to be replaced). Since the number of data elements is much larger than the number of dictionary elements, and since our model assumption suggests that the dictionary atoms are of equal importance, such replacement is very effective in avoiding local minima and overfitting. Similar to the idea of removal of unpopular elements from the dictionary, we found that it is very effective to prune the dictionary from having too-close elements. If indeed such a pair of atoms is found (based on their absolute inner product exceeding some threshold), one of those elements should be removed and replaced with the least represented signal element.”).).  
Avasarala in view of Durand, in further view of Schreter and Aharon are analogous art as both are in the field of computer science and both utilize algorithms for comparing elements (whether it be generic elements or vectors in binary representation) to remove or replace elements. 
Avasarala in view of Durand, in further view of Schreter and replace it with the performing algorithm as taught by Aharon to discard and replace elements (or binary representation vectors) stored in computer memory or in a dictionary structure. The motivation to combine is indicated in Aharon, as a way to identify which dictionary elements occur less often and thus can be pruned, to avoid localization of data and avoid overfitting the training data set, resulting in a more generalized training set for making better predictions and improving training performance (Aharon p.4318, column 2: “Since the number of data elements is much larger than the number of dictionary elements, and since our model assumption suggests that the dictionary atoms are of equal importance, such replacement is very effective in avoiding local minima and overfitting. Similar to the idea of removal of unpopular elements from the dictionary, we found that it is very effective to prune the dictionary from having too-close elements.”).

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332.  The examiner can normally be reached on Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.


/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        



/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121