DETAILED ACTION
This is the response to applicant’s amendment action regarding application number 15/815,899, filed November 17, 2017.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Amendments
The amendment filed May 18, 2022 has been entered. Examiner acknowledges receipt of Amendments to Application 15/815,899, which include: Amendments to the Claims, and Remarks containing Applicant’s amendments. 
Regarding Applicant’s Remarks, Examiner acknowledges Claims 1-3, 8-10, and 15-17 have been amended. Claims 1-20 remain pending in the application. 
Regarding Applicant’s Remarks and Amendments to the Claims, Examiner acknowledges Applicant’s amendments have resolved the objections identified in Claims 1, 8, and 15, and therefore the respective claim objections previously set forth in the Non-Final Office Action mailed February 18, 2022 are withdrawn. However, Examiner notes that the amended claims have introduced new claim objections, which are identified in the sections indicated below.

Response to Arguments
Examiner acknowledges receipt of Arguments to Application 15/815,899, which include: Remarks containing Applicant’s arguments. 
Regarding Applicant’s Remarks for Claims 1, 5, 8, 12, 15, and 19 under 35 U.S.C. 103 as being unpatentable over Hodjat et al., U.S. PGPUB 2017/0293849, published 10/12/2017 [hereafter referred as Hodjat] in view of Sepahvand et al., Generating Graphical Chain by Mutual Matching of Bayesian Network and Extracted Rules of Bayesian Network Using Genetic Algorithm, arXiv:1412.4465v1, December 15 2014 [hereafter referred as Sepahvand], in further view of Fidelis et al., Discovering Comprehensible Classification Rules with a Genetic Algorithm, Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (Cata.No.00th 8512), July 16-19 2000 [henceforth referred as Fidelis], in even further view of Chatterjee et al., U.S. Patent 10,824,959, filed 2/16/2016 [hereafter referred as Chatterjee]; for Claims 2-3, 6, 9-10, 13-14, 16-17, and 20 under 35 U.S.C. 103 as being unpatentable over Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee as applied to Claims 1, 8, and 15; in event further view of Castellanos et al., U.S. PGPUB 2012/0089620, published 4/12/2012 [hereafter referred as Castellanos]; in view of Fidelis as applied to Claims 1, 8, and 15, in further view of Castellanos et al., U.S. PGPUB 2012/0089620, published 4/12/2012 [henceforth referred as Castellanos]; and for Claims 4, 11, and 18 under 35 U.S.C. 103 as being unpatentable over Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee as applied to Claims 1, 8, and 15; in even further view of Kapila et al., A Genetic Algorithm with Entropy Based Initial Bias for Automated Rule Mining, Int'l Conf. on Computer & Communication Technology (ICCCT '10), IEEE 2010, pp.491-495 [hereafter referred as Kapila], Examiner acknowledges Applicant’s arguments and have considered them, and have found them to be not persuasive. Examiner notes that the applicant has amended the claims such that it necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the applicant’s amended claims are provided in the sections indicated below. 
Regarding Applicant’s Remarks:
“In the Examiner's interview summary dated May 18, 2022, and in view of Applicant's proposed amendment as discussed during the interview, the Examiner alleges that Hodjat "indicates that the probability aggregator used in the training system [allows] the calculated probability for each rule to be interpreted as a representation of a calculated mutual information." Applicant respectfully disagrees.
First, Hodjat's probability aggregator "determines the probability output of an individual or a ruleset for a given data point in an ordered series such as the production data sequence." (Hodjat [0073], [0077]). In other words, the probability aggregator is obtaining the probability that, in the aggregate, the input data will result in a given classification when applied to an individual rule or a set of rules (Hodjat refers to a rule as an "individual," see [0007], and produces a ruleset that outputs an overall probability of an event happening (see [0013]-[0014]).
This is different from "producing the set of class level rules further comprises calculating a mutual information representing a mutual dependence between the respective class level rule and the predicted class," as now claimed, at least because the probability aggregator of Hodjat is not used to produce any class level (logic-based) rules, as claimed, but rather to evaluate existing rules within a population of production rulesets and produce rules that predict event probabilities (see FIG. 4 and element 122). Applicant notes that on page 11 of the Office Action, the Examiner acknowledges that Hodjat discloses a "rule expressed as an IF-THEN relationship of conditions containing a feature attribute, its corresponding value, a threshold and a rule-level probability (RLP) that corresponds to a probability associated with an output class," i.e., a probabilistic rule, which is different from "class level rule representing a logical conditional statement that predicts that the respective instances are members of the particular class," i.e., not a probabilistic rule, a distinction clearly made by Hodjat in [0013]-[0014]. Applicant further submits, as discussed in further detail below, that it would not have been obvious to substitute mutual information for the fitness score disclosed by Hodjat at least because these are entirely different metrics that are not readily interchangeable (despite their common relation to the general concept of fitness, they are measuring fitness in entirely different ways).”
Examiner points out that the majority of Applicant’s above argument is directed to the newly amended claim limitations which were not previously entered. Examiner further reminds Applicant that the comments provided in the Applicant’s Interview Summary (conducted May 11, 2022 and mailed May 18, 2022) were intended to demonstrate the broadness of the term “mutual information”, and how broad the term can be reasonably applied, to encourage the Applicant to apply additional details into the claim limitation to distinguish the Applicant’s claim, and hence do not represent an examination of the amended and newly introduced limitations that were not previously entered. The updated claim limitations and Examiner’s analysis of the amended and new limitations with respect to existing and new prior art are provided in the relevant sections indicated below. Hence, given that those newly introduced and amended limitations regarding mutual information were not previously entered, Examiner will not address those arguments and will only address those arguments recited above that are directed to the Hodjat reference, where the Applicant asserts that “Hodjat is not used to produce any class level (logic-based) rule, but to evaluate existing rules within a population of production ruleset and produce rules that predict event probabilities.”, which is directed to the following recited limitation in Applicant’s independent Claim 1 (“applying, by the processor-based system, the instance level conditions for each of the corresponding instances to a genetic algorithm to produce a set of class level rules, each class level rule representing a logical conditional statement that predicts that the respective instances are members of the particular class”). Examiner has considered Applicant’s argument, and has found the argument to be not persuasive. Examiner reminds Applicant that MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. Hence, the above recited limitation broadly recites applying a set of instance level conditions to a genetic/evolutionary algorithm to produce a set of class level rules, where these class level rules contain logical conditional statements that predicts that the respective instances are member of a particular class. As indicated in the Non-Final Office Action mailed February 18, 2022, Hodjat teaches a training system that uses an evolutionary algorithm to evolve rulesets, where these rulesets (represented as individuals in the evolutionary algorithm) are defined as containing rules representing IF-THEN conditions that include a probability indicating the probability of membership in a predetermined class if the conditions of the rules are met (Hodjat [0013]: “… each rule in the ruleset includes a probability indicating the probability of membership in a predetermined class if the conditions of the rules are met … there may be more than one rule in the ruleset, each with its own conditions and corresponding probability.”; [0050]: “The rulesets used in the production system may be created in any known way of creating rulesets … The rulesets may be automatically generated using machine learning. .. the rulesets may be evolved using an evolutionary algorithm. The training system 110 is a system for evolving rulesets to be used in a production system.”; [0055]: “A ruleset 300 is composed of one or more rules 306. Each rule 306 contains one or more conditions 308 and a rule-level probability (RLP) 310. The rule level probability 310 indicates the probability that membership in the class exists when the conditions of this rule are satisfied. … A rule 306 is a conjuctive list of one or more conditions 308. Each rule in the ruleset may have a different number of conditions. A condition specifies a relationship between a particular feature value in the input data and a value in the condition …”; [0062]-[0067]: “In a healthcare embodiment, a ruleset can be thought of as a set of rules predicting a patient’s future state, given the patient’s current and past state … the set of rules may classify a patient’s current state based on current and past state. The rule-level certainty value of the rule can be an estimated probability of membership in the class … An example rule is as follows: condition 1.1: pulse[t]>=120; condition 1.2: blood pressure[t-1]>=120; condition 1.3: blood pressure[t-6]<90; RLP 1: high blood pressure related event 0.65; If condition 1.1 and condition 1.2 and condition 1.3, then RLP 1.”; and [0087]: “… the candidate pool 116 is initialized by candidate individual pool initialization module 602, which creates an initial set of candidate individuals … Each individual includes one or more rules, each with one or more conditions. Each rule also includes a rule-level probability …”). Hodjat further teaches each individual ruleset is evaluated and assigned a fitness estimate/score, where the individuals with the highest fitness are applied to a procreation module to create new individuals through combination and/or mutation based on the parent individuals, with the end result being a set of individuals/rulesets that exhibit the best fitness are added to the production ruleset population, such that this procreation module performing crossover, mutation steps of a genetic algorithm corresponds to a process that produces a set of class level rules (Hodjat [0088]-[0092]: “… After the tests, candidate testing module 604 updates the local fitness estimate associated with each of the individuals tested … After the candidate individual pool 116 has been updated, a procreation module 608 evolves a random subset of them. Only individuals in the candidate individual pool with high fitness scores are permitted to procreate. Any conventional or future-developed technique can be used for procreation. … conditions, outputs, or rules from parent individuals are combined in various ways to form child individuals, and then, occasionally, they are mutated. The combination process … may include crossover – i.e., exchanging conditions, outputs, or entire rules between parent individuals to form child individuals. …  After procreation, candidate testing module 604 operates again on the updated candidate individual pool 116. The process continues repeatedly. … The individuals having the best fitness score at the end of the training session are added to the production ruleset population 122 …”). Hence, given the evidence provided above, Hodjat does teach the recited claim limitation and is within scope of the Applicant’s claimed invention, and as such, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Applicant’s assertion that Hodjat’s teachings in Hodjat [0013], [0055], and [0062]-[0067], where this ruleset and the corresponding probability indicating the probability of membership in a predetermined class if the conditions of the rules are met is not the same as “class level rule representing a logical conditional statement that predicts that the respective instances are members of the particular class” is also not persuasive. As indicated earlier, MPEP 2111 requires that during patent examination, the pending claims must be given their broadest reasonable interpretation consistent with the specification, and an Examiner must construe claim terms in the broadest reasonable manner during prosecution as is reasonably allowed in an effort to establish a clear record of what applicant intends to claim. In light of Applicant’s specification paragraph [0018], a logical conditional statement is referred to as a rule, and exhibits an example format “if condition A is true and condition B is true then the model predicts that the input combination of condition A and condition B is classified in class C”. Examiner points out that Hodjat [0062]-[0067] recited above provides an example in the context of a healthcare embodiment, using a ruleset containing conditional statements (condition 1.1, condition 1.2, condition 1.3) and further defining a rule-level probability “RLP 1” indicating a high blood pressure related event and an associated probability 0.65, where this rule-level probability is used in an IF-THEN condition statement to provide an estimated probability of membership (0.65) in the high blood pressure class, provided that the conditions are met according to the stated ruleset (Hodjat [0066]-[0067]: “… RLP 1: high blood pressure related event 0.65; If condition 1.1 and condition 1.2 and condition 1.3, then RLP 1.”). Hence, given the evidence provided above, Hodjat does teach the recited claim limitation and is within scope of the Applicant’s claimed invention, and as such, Applicant’s argument is not persuasive, and the prior art rejection is maintained.
Regarding applicant’s Remarks:
“Second, although the Examiner alleges that mutual information is not a novel inventive concept, Applicant reminds the Examiner that a prima facie case of obviousness requires evaluating the claim as a whole and not the elements of the claim individually. The Examiner cites Huang for allegedly disclosing the use of mutual information in a genetic algorithm. However, Huang suffers from the same deficiency as Hodjat in that Huang is using mutual information between predictive labels of a trained classifier (i.e., the classified outputs of the model) and the "true classes," which again is different from "producing the set of class level rules further comprises calculating a mutual information representing a mutual dependence between the respective class level rule and the predicted class," as discussed in the preceding paragraph. Thus, even if Hodjat is modified by substituting the fitness score for the mutual information of Huang, the alleged modification of references still does not arrive at the claimed subject matter.”
Examiner has considered the above arguments, and has found them to be not persuasive. Examiner reminds Applicant that Applicant’s own specification paragraph [0042] indicated that mutual information is a fundamental concept in information theory ([0042]: “… The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory, that defines the amount of information held in a random variable …”), which establishes that the concept of mutual information is fundamentally known in the prior art, and hence Examiner pointed out this paragraph as part of the Interview Summary to indicate that the recitation of applying “mutual information” in a genetic algorithm not sufficient enough for it to be a novel inventive concept. With regards to Applicant’s argument that the proposed Huang reference cited in the Applicant’s Interview Summary (conducted May 11, 2022 and mailed May 18, 2022) does not provide motivation or rationale for a prima facie case of obviousness, Examiner reminds Applicant that the inclusion of this reference in the Interview Summary was to encourage the Applicant to further refine their claim language, and hence does not represent an examination of the amended and newly introduced limitations that were not previously entered with respect to existing or new prior art. The updated claim limitations and Examiner’s analysis of the amended and new limitations with respect to existing and new prior art are provided in the relevant sections indicated below. 
Regarding Applicant’s above statement about establishing a prima facie case of obviousness, Examiner further points to MPEP 2145 (III) which provides the guidance for the test for obviousness: “The test for obviousness is not whether the features of a secondary reference may be bodily incorporated into the structure of the primary reference …. Rather, the test is what the combined teachings of those references would have suggested to those of ordinary skill in the art.”, with further guidance indicating ("[I]t is not necessary that the inventions of the references be physically combinable to render obvious the invention under review.") and ("Combining the teachings of references does not involve an ability to combine their specific structures."). MPEP 2141(II)(C) further states that “A person of ordinary skill in the art is also a person of ordinary creativity, not an automaton.”, and that “[I]n many cases a person of ordinary skill will be able to fit the teachings of multiple patents together like pieces of a puzzle.”, such that Office personnel may also take into account “the inferences and creative steps that a person of ordinary skill in the art would employ.”. Given that the Applicant’s above argument is directed to establishing a prima facie case of obviousness with the new Huang reference, Examiner will briefly summarize the new analysis with the Huang reference here, and will point to the identified relevant section below for more details. Huang teaches a genetic algorithm method using mutual information values, where the mutual information values are used to determine and rank features such that the features with the maximum mutual information are selected, with the calculation of the mutual information being performed within a hybrid genetic algorithm during multiple iterations of generating the child chromosomes based on the earlier parent chromosomes (Huang p.1825 Abstract; p.1829 Section 3. Feature ranking by conditional mutual information; and p.1834-1835 Section 4.3 Implementation of the hybrid GA wrapper approach). Huang further teaches the general equation for mutual information, expressed as the common information between two random variables X and Y (            
                I
                
                    
                        X
                        ;
                        Y
                    
                
                =
                
                    
                        ∑
                        
                            y
                            ∈
                            Y
                        
                    
                    
                        
                            
                                ∑
                                
                                    x
                                    ∈
                                    X
                                
                            
                            
                                P
                                
                                    
                                        x
                                        ,
                                        y
                                    
                                
                                l
                                o
                                g
                                ⁡
                                (
                                
                                    
                                        P
                                        
                                            
                                                x
                                                ,
                                                y
                                            
                                        
                                    
                                    
                                        P
                                        
                                            
                                                x
                                            
                                        
                                        ∙
                                        P
                                        
                                            
                                                y
                                            
                                        
                                    
                                
                                )
                            
                        
                    
                
            
        ), where a large mutual information value indicates that the two variables are closely related, and a small or zero value indicates that the two variables are unrelated or independent of each other (Huang pp.1827-1828 Section 2.1 Entropy and mutual information, and equation (4)). Examiner points out that Huang p.1828 equation (4) resembles the equation provided in Applicant’s specification paragraph [0042], and now recited as a new limitation in amended independent Claim 15 ([0042]: “… The MI of two discrete random variables X and Y can be defined as:             
                I
                
                    
                        X
                        ;
                        Y
                    
                
                =
                
                    
                        ∑
                        
                            y
                            ∈
                            Y
                        
                    
                    
                        
                            
                                ∑
                                
                                    x
                                    ∈
                                    X
                                
                            
                            
                                P
                                
                                    
                                        x
                                        ,
                                        y
                                    
                                
                                l
                                o
                                g
                                ⁡
                                (
                                
                                    
                                        p
                                        
                                            
                                                x
                                                ,
                                                y
                                            
                                        
                                    
                                    
                                        p
                                        
                                            
                                                x
                                            
                                        
                                        p
                                        
                                            
                                                y
                                            
                                        
                                    
                                
                                )
                            
                        
                    
                
            
         …”). Applicant’s assertion that “Huang is using mutual information between predictive labels of a trained classifier (i.e., the classified outputs of the model) and the "true classes," which again is different from "producing the set of class level rules further comprises calculating a mutual information representing a mutual dependence between the respective class level rule and the predicted class," is also not persuasive, since Huang teaches two different mutual information calculations, with Applicant only reciting the mutual information calculation involving the labels of a trained classifier and true classes I(Y;             
                
                    
                        Y
                    
                    
                        f
                    
                
            
        ). Huang also teaches a mutual information calculation I(C; S) that involves determining a subset S that maximizes the conditional mutual information given a set A with n features and a set C of all output classes, where the set S with k features represents a class level rule with different features (corresponding to conditions in the class level rule). This conditional mutual information I(C; S) calculation represents the mutual information between a subset of identified features and the predicted output classes from a classifier, which forms the basis for the I(C;             
                
                    
                        f
                    
                    
                        i
                    
                
                |
                S
            
        ) mutual information value used in the hybrid genetic algorithm for searching a global optimal subset of features (Huang pp.1829-1831 Section 3. Feature ranking by conditional mutual information 1st paragraph: “… given an initial set A with n features and C set of all output classes, find out the subset S⊆A with k features that minimizes H(C|S), i.e., that maximizes the mutual information I(C;S) … The mutual information I(C; S) measures the amount of information that the feature subset S contains about the output classes C …”; pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs: “… we propose a novel hybrid GA for feature selection problem, in which the feature’s conditional mutual information I(C;             
                
                    
                        f
                    
                    
                        i
                    
                
                |
                S
            
        ) is used as a measure to rank the candidate features, and local search operations are performed in a filter manner … As discussed in Section 3, the conditional mutual information I(C;             
                
                    
                        f
                    
                    
                        i
                    
                
                |
                S
            
        ) measures the new information to the output class C contributed by feature             
                
                    
                        f
                    
                    
                        i
                    
                
            
         given the subset S of features selected … In a generation of the hybrid GA, each chromosome of the population corresponds to a scheme of feature selection. The first operation (a) aims to find features in the selected subset S that are less informative to classification and remove them from S …”; and pp.1834-1835 Section 4.3 Implementation of the hybrid GA wrapper approach, and Procedure HGA for feature selection; inside the “While t≤T) do” loop, “Local improvement of each chromosome of P(t) by I(C;             
                
                    
                        f
                    
                    
                        i
                    
                
                |
                S
            
        ) …”). Hence, based on a prima facie analysis of the Huang reference, it would have been obvious for a person having ordinary skill in the art to apply the mutual information teachings established in Huang to use mutual information in a genetic algorithm to perform feature ranking and identification of a set of features in combination with the teachings already established in the Hodjat, Sepahvand, Fidelis, and Chatterjee references for the independent claims. The motivation to combine is taught in Huang, since this mutual information based on a set of features S and output class C provides a criterion in a genetic algorithm to rank the candidate features, and removing the ones that are less informative or less relevant to classification, thereby improving the search process and improving the efficiency of the genetic algorithm (Huang pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs). Additionally, the mutual information calculations shown in equations (23) and (32) in Huang pp.1829-1831 Section 3 provides a more scalable and memory efficient way to determine the mutual information as the number of features increases in the set, thus allowing for a more computationally efficient algorithm (Huang pp.1829-1831 Section 3 Feature ranking by conditional mutual information, and equations (23) and (32), in particular p.1831 col.1 2nd paragraph). Hence, given the evidence provided above, the Huang reference does indeed teach the recited claim limitation and is within scope of the Applicant’s claimed invention, and as such, Applicant’s argument is not persuasive.
Regarding applicant’s Remarks:
“With respect to a fitness score, as recited in claim 2, the Examiner alleges on pages 24-26 of the Office Action that Hodjat discloses "a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, which are initialized with initial fitness estimates, and run through a battery of trials to test the training data, and updating the corresponding fitness estimates for each individual and ranking individuals based on their fitness score." The Examiner further alleges that Fidelis discloses ( emphasis added) "a fitness score calculation comprising of sensitivity and specificity indicators, with the sensitivity indicator representing recall (corresponding to the coverage of the rule, which is expressed as a ratio of true positives over the sum of true positives and false negatives), and the specificity indicator representing precision (corresponding to the precision of the rule, which is expressed as a ratio of true positives over the sum of true positives and false positives)."
In view of the foregoing, the Examiner further alleges that Hodjat discloses (emphasis added) "wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using ... the fitness score," as recited in claim 2, and that (emphasis added) "[i]t would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the fitness score equation taught in Hodjat ... in further view of Fidelis ... for validating the discovered rules produced by a genetic algorithm. Since Hodjat ... in further view of Fidelis ... already teaches using a fitness score (comprising of precision and recall indicators) to evaluate the accuracy of the extracted rules and rank them, a person having ordinary skill in the art would also consider using a variation of the fitness score calculation (with the same precision and recall indicators) as taught in Castellanos for performing validation and ranking in order to produce the same predictable results." Thus, Castellanos appears to be cited for allegedly disclosing "a variation of the fitness score calculation (with the same precision and recall indicators) as taught in Castellanos."
However, the rationale used by the Examiner in regard to the fitness score does not apply to mutual information, as recited in amended claim 1. Applicant respectfully submits that, even under the broadest reasonable interpretation, "mutual information" is patentably distinct from a "fitness score."
As disclosed in paragraph [0042] of the present specification (emphasis added), "it is possible to derive rule fitness based on mutual information (MI) instead of using the fitness score described above (i.e., the harmonic mean of precision and coverage). The MI of two random variables is a measure of the mutual dependence between the two variables. More specifically, it quantifies the amount of information (in units such as shannons, more commonly called bits) obtained about one random variable, through the other random variable. The concept of mutual information is intricately linked to that of entropy of a random variable, a fundamental notion in information theory, that defines the amount of information held in a random variable."
Although the Office gives the claims their broadest reasonable interpretation, this interpretation must be consistent with the one that those skilled in the art would reach. See In re Morris, 127 F.3d 1048, 1054, 44 USPQ2d 1023, 1027 (Fed. Cir. 1997). Thus, under the broadest reasonable interpretation in light of the specification, mutual information is a measure of fitness that is derived from the mutual dependence between two variables. By contrast, as described in paragraph [0039] of the present specification, "the fitness score of each rule is evaluated based on the precision and coverage of the rule," which is entirely different and patentably distinct from the mutual dependence between two variables at least because the fitness score depends on factors such as precision and coverage of a rule that has no bearing on "the amount of information (in units such as shannons, more commonly called bits) obtained about one random variable, through the other random variable." Likewise, mutual information has no bearing on precision and coverage of a rule, as would be understood by one of skill in the art.”
Examiner has considered the above arguments, and has found them to be not persuasive. Examiner notes that Applicant’s above arguments are directed to the Applicant applying the amended and newly introduced limitations to the existing prior art references, where these amended and newly introduced limitations were not previously entered. Examiner points out that the Hodjat, Sepahvand, Fidelis, Chatterjee, Castellanos, and Kapila references teach the recited limitations identified in the independent and dependent claims according to the Non-Final Office Action mailed February 18, 2022, and hence these arguments are found to be not persuasive, and the existing prior art rejection is maintained. The prior art analysis for the newly introduced limitations will be discussed in the relevant sections identified below. Additionally, Examiner addresses the following statement made by the Applicant ("the fitness score of each rule is evaluated based on the precision and coverage of the rule," which is entirely different and patentably distinct from the mutual dependence between two variables at least because the fitness score depends on factors such as precision and coverage of a rule that has no bearing on "the amount of information (in units such as shannons, more commonly called bits) obtained about one random variable, through the other random variable."). MPEP 2145 (VI) indicates that “Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.”. Examiner also cites the guidelines in MPEP 2111.01(II) which caution against importing written description into a claim limitation that is broader than the cited embodiment: "Though understanding the claim language may be aided by explanations contained in the written description, it is important not to import into a claim limitations that are not part of the claim. For example, a particular embodiment appearing in the written description may not be read into a claim when the claim language is broader than the embodiment.". Examiner points out that the underlined information is not part of any of the Applicant’s amended limitations, and hence will not be read into the claim analysis during the prior art examination of the newly amended claims.
Regarding applicant’s Remarks:
“Hodjat, Sepahvand, Fidelis, Chatterjee, Castellanos, and Kapila, when considered alone or in any combination, fail to disclose or reasonably suggest (emphasis added) "wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted class, wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the mutual information," as recited in amended claim 1. Therefore, it would not have been obvious in view of any of the cited references to substitute mutual information for the fitness score disclosed by Hodjat at least because there is no finding in any of the cited references that (i) the substituted components and their functions were known in the art, since none of the cited references discloses mutual information, or (ii) one of ordinary skill in the art could have substituted one known element for another, and the results of the substitution would have been predictable, since mutual information is an entirely different metric from a fitness score, as discussed above. See MPEP 2143, I (B). Here, the inventors have discovered a novel way of using mutual information to produce a set of class level rules with a genetic algorithm that is neither disclosed nor reasonably suggested by the prior art.
Furthermore, because there is no disclosure or suggestion in the prior art that a fitness score and mutual information are readily interchangeable or predictable substitutions for one another, and because there are potentially unlimited ways to evaluate fitness, it would not have been obvious for one of skill in the art to even try to substitute mutual information for a fitness score with a reasonable expectation of success. See MPEP 2143, I (E).
For at least the foregoing reasons, Hodjat, Sepahvand, Fidelis, Chatterjee, Castellanos, and Kapila, when considered alone or in any combination, fail to render any of the claims unpatentable.”
Examiner has considered the above arguments, and has found them to be not persuasive. Examiner notes that Applicant’s above arguments are directed to the Applicant applying the amended and newly introduced limitations to the existing prior art references, where these amended and newly introduced limitations were not previously entered. Examiner points out that the Hodjat, Sepahvand, Fidelis, Chatterjee, Castellanos, and Kapila references teach the recited limitations identified in the independent and dependent claims according to the Non-Final Office Action mailed February 18, 2022, and hence these arguments are found to be not persuasive, and the existing prior art rejection is maintained. 
As noted above, Applicant’s remaining arguments are directed to amended and newly introduced claim limitations, such that it necessitates further examination and re-evaluation of the amended and related original claims. The updated claim mappings according to the applicant’s amended claims are provided in the relevant sections indicated below.

Claim Objections
Claims 1, 8, and 15 are objected to because of the following informality: 
The term “the predicted class” in the claim limitation “wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted class …” should be corrected as “the predicted output class”, to use the same consistent term that identifies this predicted class as the output class that is being predicted, as recited in the earlier claim limitations (“classifying, by the processor-based system, each instance into one of the output classes …”; “applying, by the processor-based system, the instance level conditions for each of the corresponding instances to a genetic algorithm to produce a set of class level rules, each class level rule representing a logical conditional statement that predicts that the respective instances are member of the particular class”). Appropriate correction is required.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
This application currently names joint inventors. In considering patentability of the claims the examiner presumes that the subject matter of the various claims was commonly owned as of the effective filing date of the claimed invention(s) absent any evidence to the contrary.  Applicant is advised of the obligation under 37 CFR 1.56 to point out the inventor and effective filing dates of each claim that was not commonly owned as of the effective filing date of the later invention in order for the examiner to consider the applicability of 35 U.S.C. 102(b)(2)(C) for any potential 35 U.S.C. 102(a)(2) prior art against the later invention.
Claims 1, 3, 5, 8, 10, 12, 15, 17, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hodjat et al., U.S. PGPUB 2017/0293849, published 10/12/2017 [hereafter referred as Hodjat] in view of Sepahvand et al., Generating Graphical Chain by Mutual Matching of Bayesian Network and Extracted Rules of Bayesian Network Using Genetic Algorithm, arXiv:1412.4465v1, December 15 2014 [hereafter referred as Sepahvand], in further view of Fidelis et al., Discovering Comprehensible Classification Rules with a Genetic Algorithm, Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (Cata.No.00th 8512), IEEE, July 16-19 2000 [henceforth referred as Fidelis], in even further view of Chatterjee et al., U.S. Patent 10,824,959, filed 2/16/2016 [hereafter referred as Chatterjee], in even further view of Huang et al., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28 (2007), 2007 Elsevier B.V. [hereafter referred as Huang].  
 
Regarding amended Claim 1, 
Hodjat teaches
(Currently amended) A computer-implemented method of interpreting a machine learning model, the method comprising: 
receiving, by a processor-based system, a set of training data and a set of output classes for classifying a plurality of instances of the set of training data, each instance representing at least one feature of the set of training data (Examiner’s note: Hodjat teaches a data mining system for evolving rulesets using an evolutionary algorithm, where the training data for the data mining system is collected from an environment generating a large amount of data over a period of time for the purposes of extracting useful knowledge and patterns, where the training portion of the system interacts with a database containing a pool of candidate individuals, and where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition is expressed as a relationship between a feature attribute and its corresponding value (Hodjat [0005]-[0007]; Figure 8; [0051] and [0086]-[0087]; see also Figure 3 and [0055]-[0057]). Hodjat further teaches that this system is implemented on a computer system containing a processor subsystem containing one or more processors (Hodjat Figure 10, element 1014; and [0116]).); 
applying, by the processor-based system, each instance and at least one perturbation of the respective instance to the machine learning model having a function that takes each instance and each perturbation of the respective instance to obtain, from an output of the machine learning model, a set of probabilities (Examiner’s note: Under its broadest reasonable interpretation, the term “at least one perturbation of the respective instance” is interpreted as any variation of the respective instance, such as a change within a feature value of an instance, or the presence of similar instances with different output results. As indicated earlier, Hodjat Figure 8; [0051] and [0086]-[0087] teaches a pool of candidate individuals for training, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition expressed as a relationship between a feature attribute and its corresponding value. Hodjat further teaches each rule entry has a corresponding rule-level probability (RLP) that represents the probability of membership in a class, where this rule-level probability can represent an aggregated value (average, minimum, or maximum) of all condition level certainty values, where the conditions under aggregation represent different variations of the same condition (“perturbations of the respective instance”), and where this aggregation is performed by a probability aggregator present in the training/production portions of the system (Hodjat [0055]: “… Each rule 306 contains one or more conditions 308 and a rule-level probability (RLP) 310. The rule level probability 310 indicates the probability that membership in the class exists when the conditions of this rule are satisfied. … A rule 306 is a conjuctive list of one or more conditions 308. Each rule in the ruleset may have a different number of conditions. A condition specifies a relationship between a particular feature value in the input data and a value in the condition …”; and [0061]-[0067]; in particular [0061]: “The condition-level certainty values for input data applied to a rule 306 are aggregated to determine the rule-level certainty value. In one embodiment, the certainty aggregation function can be an average of the all the condition-level certainty values. For example, if the condition-level certainty values for three conditions are 0.2, 0.4, and 0.6 respectively, the rule-level certainty value may be 0.4. In one embodiment, the rule-level certainty value will be the minimum value … In another embodiment, the rule-level certainty value will be the maximum value …” and [0076]-[0078]: “FIG. 5 is a method of operation of a probability aggregator 406 in either the training system [o]r the production system … a probability aggregator 406 determines the probability output of an individual … for a given data point … An individual or a ruleset is also received in block 402 providing one or more rules, each of the rules having one or more conditions and an indication of a rule-level probability of membership in a predetermined class.”).) …
classifying, by the processor-based system, each instance into one of the output classes (Examiner’s note: Hodjat teaches a set of rules classifying a patient’s current state based on current and past state based on a set of conditions, where the current state and past state represent different output classes, where in the example provided, a current state and past state for a rule entry based on measuring blood pressure and pulse conditions can represent a high blood pressure related event and a normal blood pressure related event, respectively (Hodjat [0062]-[0068]).) …
producing, for each respective instance, a set of instance level conditions, each representing … each feature of the respective instance in the output class where the instance is classified (Examiner’s note: As indicated earlier, Hodjat teaches a training portion of the system interacting with a database containing a pool of candidate individuals, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition is expressed as a relationship between a feature attribute and its corresponding value (i.e., feature/value pairs) (Hodjat Figure 8; [0097]-[0098]; see also Figure 3 and [0055]-[0057]).); 
… applying, by the processor-based system, the instance level conditions for each of the corresponding instances to a genetic algorithm to produce a set of class level rules, each class level rule representing a logical conditional statement that predicts that the respective instances are members of the particular class (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, with each rule expressed as an IF-THEN relationship of conditions containing a feature attribute, its corresponding value, a threshold and an rule-level probability (RLP) that corresponds to a probability associated with an output class (Hodjat [0013]: “… each rule in the ruleset includes a probability indicating the probability of membership in a predetermined class if the conditions of the rules are met …”; Figure 3, [0050]: “… the rulesets may be evolved using an evolutionary algorithm. The training system 110 is a system for evolving rulesets to be used in a production system.” and [0055]; [0062]-[0067]: “In a healthcare embodiment, a ruleset can be thought of as a set of rules predicting a patient’s future state, given the patient’s current and past state … the set of rules may classify a patient’s current state based on current and past state. The rule-level certainty value of the rule can be an estimated probability of membership in the class … An example rule is as follows: … If condition 1.1 and condition 1.2 and condition 1.3, then RLP 1.”; and [0087]: “… the candidate pool 116 is initialized by candidate individual pool initialization module 602, which creates an initial set of candidate individuals … Each individual includes one or more rules, each with one or more conditions. Each rule also includes a rule-level probability …”). Hodjat further teaches each individual ruleset is evaluated and assigned a fitness estimate/score, where the individuals with the highest fitness are further applied to a procreation module to create new individuals through combination and/or mutation based on the parent individuals, with the end result being a set of individuals/rulesets that exhibit the best fitness are added to the production ruleset population, such that this procreation module performing crossover, mutation steps of a genetic algorithm on a set of individual rules containing conditional statements corresponds to a process that produces a set of class level rules (Hodjat Figure 6, [0088]-[0092]: “… candidate testing module 604 updates the local fitness estimate associated with each of the individuals tested … a procreation module 608 evolves a random subset of them. Only individuals in the candidate individual pool with high fitness scores are permitted to procreate … conditions, outputs, or rules from parent individuals are combined in various ways to form child individuals, and then, occasionally, they are mutated. The combination process … may include crossover – i.e., exchanging conditions, outputs, or entire rules between parent individuals to form child individuals. …  After procreation, candidate testing module 604 operates again on the updated candidate individual pool 116. The process continues repeatedly. … The individuals having the best fitness score at the end of the training session are added to the production ruleset population 122 …”); and 
using at least a portion of the set of class level rules (Examiner’s note: In light of applicant’s specification paragraph [0018], this claim limitation is interpreted as occurring after producing the set of class level rules, in the use case where these rules are provided to a user as an explanation for further processing. As indicated earlier, Hodjat teaches a training portion of the system involving the procreation module being invoked for multiple iterations, where for each iteration, new individuals created by combination and/or mutation are placed in the pool of candidate individuals to be chosen as new parents for successive combinations and/or mutations, and undergo further fitness evaluations through the competition module (Hodjat Figure 6, elements 606, 116, 608 and [0090]-[0092]). Once the best individuals are identified, they are provided into a production portion of the system where they are used for determining a recommendation through a decision/action system, which outputs a recommendation for a human to perform an action (Hodjat [0070]).) …  
While Hodjat teaches probabilities associated with an output class, Hodjat does not explicitly teach
… a set of probabilities that each feature of the respective instance belongs to each of the output classes;
classifying … for which, based on the set of probabilities, the probability that each feature of the respective instance belongs to the respective output class is highest; …
Sepahvand teaches
… a set of probabilities that each feature of the respective instance belongs to each of the output classes (Examiner’s note: Sepahvand teaches a Bayesian network modeling the conditional probabilities of variables in a rule, where each of the variables contain a plurality of classes and associated probability values associated with an output class, where the Bayesian network is used to identify the feature-associated chains in the network that have higher probabilities, that are useful for classification in order to understand existing events and predict future events (Sepahvand p.2 col.1 6th paragraph-col.2 2nd paragraph (Section III. Background, Section IV. Proposed Method) and p.2 col.2 Figure 1).);
classifying … for which, based on the set of probabilities, the probability that each feature of the respective instance belongs to the respective output class is highest (Examiner’s note: As indicated earlier, Sepahvand teaches a Bayesian network is used to identify the feature-associated chains in the network that have higher probabilities, that are useful for classification in order to understand existing events and predict future events (Sepahvand p.2 col.1 6th paragraph-col.2 2nd paragraph (Section III. Background, Section IV. Proposed Method) and p.2 col.2 Figure 1).); …
Both Hodjat and Sepahvand are analogous art since both teach generating and identifying relevant rules using genetic algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take rule level probability taught in Hodjat and extend it through use of a Bayesian network to determine probabilities for respective features within the rule conditions taught in Sepahvand as a way to determine rule conditions containing the most probable features (i.e., those with the highest probability). The motivation to combine is taught in Sepahvand, since using a Bayesian network to identify probabilities for each feature in a rule condition and identifying the most probable features (and hence the most probable paths and rules) allows the identification of those rule conditions that are the most useful for classification and determination of future events. Sepahvand further teaches that by focusing on those rule conditions that contain the most probable features and applying them to a genetic algorithm leads to a more computationally efficient way to determine the optimum rules, thus making the system more efficient (Sepahvand p.1 col.2 2nd paragraph (Section I. Introduction); p.2 col.2 Section IV. Proposed Method 1st-3rd paragraphs; p.5 col.2 Section VI. Evaluation 3rd paragraph).
While Hodjat in view of Sepahvand teach the conditions in each rule entry (i.e., a set of instance level conditions) represented as feature/value pairs for the procreation module, Hodjat in view of Sepahvand does not explicitly teach 
… a set of instance level conditions each representing a presence or absence of each feature …
Fidelis teaches
… a set of instance level conditions each representing a presence or absence of each feature (Examiner’s note: Fidelis teaches encoding chromosome structures representing rule conditions for use in a genetic algorithm, where each gene represents a condition with attributes, and where each gene is represented by a weight field taking values in range [0..1], indicating whether or not the corresponding attribute is present according to a limit threshold, where if the weight field is below a threshold, the smaller the probability that the condition will be present, and hence the condition is effectively removed from the rule (corresponding to an absence) (Fidelis p.806 col.1 Figure 1 and Section 3.1 Individual Encoding 3rd paragraph: “… The field weight (Wi) is a real-valued variable taking values in the range [0..1]. This variable indicated whether or not the corresponding attribute is present in the rule. … the greater the value of the threshold Limit, the smaller the probability that the corresponding condition will be present in the rule … so that conditions with a weight smaller than or equal to 0.3 were effectively removed from the rule.”).) …
Both Hodjat in view of Sepahvand and Fidelis are analogous art since both teach generating and identifying relevant rules using genetic algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the feature/value encoding representation for the evolutionary algorithm taught in Hodjat in view of Sepahvand and enhance with the encoding mechanism taught in Fidelis as a way to encode the set of instance level conditions for an evolutionary algorithm. The motivation to combine is taught in Fidelis, since this encoding method is flexible enough to support encoding of a plurality of conditions in a chromosome without having to change the length of the chromosome, which then allows the system to consistently process and perform crossover and mutations using equal length chromosomes in a consistent way, thus improving the computational efficiency of the genetic algorithm (Fidelis p.806 col.2 3rd paragraph (Section 3.1 Individual Encoding)).
While Hodjat in view of Sepahvand, in further view of Fidelis teaches using at least a portion of the set of class level rules to update the set of training data, Hodjat in view of Sepahvand, in further view of Fidelis does not explicitly teach
… using at least a portion of the set of class level rules to update the set of training data and retrain the machine learning model using the updated set of training data.
Chatterjee teaches
… using at least a portion of the set of class level rules to update the set of training data and retrain the machine learning model using the updated set of training data (Examiner’s note: In light of applicant’s specification paragraph [0018], this claim limitation is interpreted as occurring after producing the set of class level rules, in the use case where these rules are provided to a user as an explanation for further processing. Chatterjee teaches that the explainer producing an explanatory rule set may provide the information to a client device, where a user at the client device can trigger re-generation of additional rules if the explanatory rule set is considered unsatisfactory according to a threshold (i.e., responses to observations which generated a “no explanation is available” message). Chatterjee further teaches that this re-generation of additional rules may involve using a larger input set to re-train the exemplary machine learning model, where this larger input set includes at least some observation records (“rules”) for which no explanations were available (Chatterjee Figure 9, elements 901, 925; and col.18 lines 26-38).).
Both Hodjat in view of Sepahvand, in further view of Fidelis and Chatterjee are analogous art since they both teach extracting predictive rulesets based on machine learning techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the decision/action system recommending an action taught in Hodjat in view of Sepahvand, in further view of Fidelis and enhance it to include a re-training trigger taught in Chatterjee as a way to re-generate additional rules if the provided ruleset is considered as an unsatisfactory explanation or are insufficient to generate a recommendation. The motivation to combine is taught in Chatterjee, where this trigger for re-training will allow the user to demand additional explanations beyond a general first-level explanation, resulting in a machine learning model to adjust its internal weights and internal representations to make further elaborations which identify relationships between input attributes and internal rule representations, which improves the accuracy and utility of the system using the machine learning model in terms of providing more informative explanations (Chatterjee col.15 lines 9-27 and col.15 line 43-col.16 line 23).
While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee teaches applying instance level conditions to a genetic algorithm to produce a set of class level rules, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee does not explicitly teach 
… wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted class, wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the mutual information.
Huang teaches
… wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted output class, wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the mutual information (Examiner’s note: Under its broadest reasonable interpretation in the context of the earlier recited limitations in this claim, the term “predicted class” broadly recites the particular output class that is being predicted by the respective instance conditions present in each class level rule. Huang teaches determining a subset S that maximizes the conditional mutual information I(C; S) given a set A with n features and a set C of all output classes, where the set S with k features represents a class level rule with different features (corresponding to conditions in the class level rule). This conditional mutual information I(C; S) calculation represents the mutual information between a subset of identified features and the predicted output classes from a classifier, which forms the basis for the I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) value used in the hybrid genetic algorithm for performing feature ranking to aid in searching for a global optimal subset of features (Huang pp.1829-1831 Section 3. Feature ranking by conditional mutual information 1st paragraph: “… given an initial set A with n features and C set of all output classes, find out the subset S⊆A with k features that minimizes H(C|S), i.e., that maximizes the mutual information I(C;S) … The mutual information I(C; S) measures the amount of information that the feature subset S contains about the output classes C …”; pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs: “… we propose a novel hybrid GA for feature selection problem, in which the feature’s conditional mutual information I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) is used as a measure to rank the candidate features … As discussed in Section 3, the conditional mutual information I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) measures the new information to the output class C contributed by feature                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                        
                     given the subset S of features selected … In a generation of the hybrid GA, each chromosome of the population corresponds to a scheme of feature selection. The first operation (a) aims to find features in the selected subset S that are less informative to classification and remove them from S … These strategies can help to search global optimal subset of features and improve the efficiency of the algorithm …”; and pp.1834-1835 Section 4.3 Implementation of the hybrid GA wrapper approach: “… The implementation of the hybrid genetic algorithm for feature selection mainly includes the encoding schemes of chromosomes, evaluating fitness function, local searching operations, designing for the selection, crossover and mutation genetic operations, and stopping criterion. … The hybrid GA stops when the number of generations reaches the preset maximum generation T. … Procedure HGA for feature selection; inside the “While t≤T) do” loop, “Local improvement of each chromosome of P(t) by I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) …”).).
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee and Huang are analogous art since they both teach applying a genetic algorithm to produce an set of features, using a metric to help search and identify a set of features in which to evolve during each genetic algorithm generation.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the fitness score calculation from the genetic algorithm taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee and replace it with the mutual information calculation taught in Huang as a way improve the search efficiency and performance of the genetic algorithm. The motivation to combine is taught in Huang, since this mutual information based on a set of features S and output class C provides a criterion in a genetic algorithm to rank the candidate features, and removing the ones that are less informative or less relevant to classification, thereby improving the search process and improving the efficiency of the genetic algorithm (Huang pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs). Additionally, the mutual information calculations shown in equations (23) and (32) in Huang pp.1829-1831 Section 3 provides a more scalable and memory efficient way to determine the mutual information as the number of features increases in the set, thus allowing for a more computationally efficient algorithm (Huang pp.1829-1831 Section 3 Feature ranking by conditional mutual information, and equations (23) and (32), in particular p.1831 col.1 2nd paragraph: “… to compute the conditional mutual information I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    )  with Eq. (23) or (32), we only need to compute at most                         
                            
                                
                                    
                                        
                                            n
                                        
                                        
                                            2
                                        
                                    
                                
                            
                            +
                            n
                        
                     histograms of two variables. Therefore, the computational effort increase in the order of                         
                            
                                
                                    n
                                
                                
                                    2
                                
                            
                        
                     as the number of features increase for given number of examples and partitions. This implies that Eq. (23) or (32) can be applied to relatively large problems without excessive computational efforts compare with computing I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) directly in feature selection.”).
Regarding amended Claim 3, 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in even further view of Huang teaches
(Currently amended) The method of claim 1, further comprising sorting, by the processor-based system, the set of class level rules according to the mutual information corresponding to each of the class level rules (Examiner’s note: As indicated earlier, Huang teaches performing feature ranking with the conditional mutual information value I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ), where this feature ranking involves identifying features that are less informative to classification and removing them, and hence this ranking process that identifies and removes less informative features is a form of sorting, where this sorting process uses the calculated mutual information values for a set S of k features (representing a class level rule) to determine the most informative features in the final set S (Huang pp.1829-1831 Section 3. Feature ranking by conditional mutual information 1st paragraph; pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs).).  
Regarding original Claim 5, 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in even further view of Huang teaches
(Original) The method of claim 1, further comprising selecting, by the processor-based system, a subset of the set of class level rules that predict that at least a threshold percentage of the respective instances are members of a particular class (Examiner’s note: Hodjat teaches taking the individuals having the best fitness score at the end of the training session, and adding them to a production ruleset population for a production phase to process live production sequences, where the output of the production phase is a probability associated with the determined ruleset being compared against a predetermined threshold value to determine an outcome representing a future event and an associated action to take to remediate the event, where this determination of an outcome representing a future event is a prediction associated with a subset of conditions in the ruleset that identified the outcome (Hodjat [0092]: “… the individuals having the best fitness score at the end of the training session are added to the production ruleset population 122 where they are used to process live production data sequences 130….”; and [0068]-[0070]: “…The production system 112 operates according to one or more rulesets 300 from the production ruleset population 122. … the ruleset as a whole outputs a probability 126 for an event that can occur in the near future. In the case of the blood pressure monitoring application, the probability output 126 will indicate the possibility of a high blood pressure related event occurring in the near future. … The decision/action system 128 is a system that uses the probability output 126 from the rulesets together with predetermined threshold values to decide what if any action to take. .. if the ruleset predicts a probability higher than a predetermined threshold value, for example 50%, that a patient’s blood pressure will exceed the normal range in the near future, the decision/action system 128 may alert a nurse or doctor …”).).  
Regarding amended Claim 8, 
Hodjat teaches
(Currently amended) A computer program product including one or more non-transitory computer readable mediums having instructions encoded thereon that when executed by one or more computer processors cause the one or more computer processors (Examiner’s note: Hodjat teaches a computer system containing a storage subsystem, where the storage subsystem includes a memory subsystem and a file subsystem, where the memory subsystem contains computer instructions, when executed by the processor subsystem, cause the computer system to operate or perform the system functions (Hodjat Figure 10, elements 1014, 1024; and [0116] and [0121]).) to perform a process for interpreting a machine learning model, the process including 
receiving a set of training data and a set of output classes for classifying a plurality of instances of the set of training data, each instance representing at least one feature of the set of training data (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training data for the data mining system is collected from an environment generating a large amount of data over a period of time for the purposes of extracting useful knowledge and patterns, where the training portion of the system interacts with a database containing a pool of candidate individuals, and where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition is expressed as a relationship between a feature attribute and its corresponding value (Hodjat [0005]-[0007]; Figure 8; [0051] and [0086]-[0087]; see also Figure 3 and [0055]-[0057]). Hodjat further teaches that this system is implemented on a computer system containing a processor (Hodjat Figure 10, element 1014; and [0116]).).); 
applying each instance and at least one perturbation of the respective instance to the machine learning model having a function that takes each instance and each perturbation of the respective instance to obtain, from an output of the machine learning model, a probability (Examiner’s note: Under its broadest reasonable interpretation, the term “at least one perturbation of the respective instance” is interpreted as any variation of the respective instance, such as a change within a feature value of an instance, or the presence of similar instances with different output results. As indicated earlier, Hodjat Figure 8; [0051] and [0086]-[0087] teaches a pool of candidate individuals for training, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition expressed as a relationship between a feature attribute and its corresponding value. Hodjat further teaches each rule entry has a corresponding rule-level probability (RLP) that represents the probability of membership in a class, where this rule-level probability can represent an aggregated value (average, minimum, or maximum) of all condition level certainty values, where the conditions under aggregation represent different variations of the same condition (“perturbations of the respective instance”), and where this aggregation is performed by a probability aggregator present in the training/production parts of the system (Hodjat [0055]; and [0061]-[0067]; in particular [0061], and [0076]-[0078]).) …
classifying each instance into one of the output classes (Examiner’s note: As indicated earlier, Hodjat teaches a set of rules classifying a patient’s current state based on current and past state based on a set of conditions, where the current state and past state represent different output classes, where in the example provided, a current state and past state for a rule entry based on measuring blood pressure and pulse conditions can represent a high blood pressure related event and a normal blood pressure related event, respectively (Hodjat [0062]-[0068]).) …
producing, for each respective instance, a set of instance level conditions, each representing … each feature of the respective instance in the output class where the instance is classified (Examiner’s note: As indicated earlier, Hodjat teaches a training portion of the system interacting with a database containing a pool of candidate individuals, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, and each condition is expressed as a relationship between a feature attribute and its corresponding value (i.e., feature/value pairs) (Hodjat Figure 8; [0097]-[0098]; see also Figure 3 and [0055]-[0057]).); 
applying the instance level conditions for each of the corresponding instances to a genetic algorithm to produce a set of class level rules, each class level rule representing a logical conditional statement that predicts that the respective instances are members of the particular class (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, where an individual is represented as a plurality of rules, each rule entry contains a plurality of conditions, with each rule expressed as an IF-THEN relationship of conditions containing a feature attribute, its corresponding value, a threshold and an rule-level probability (RLP) that corresponds to a probability associated with an output class (Hodjat [0013]; Figure 3, [0050] and [0055]; [0062]-[0067]; and [0087]). Hodjat further teaches each individual ruleset is evaluated and assigned a fitness estimate/score, where the individuals with the highest fitness are further applied to a procreation module to create new individuals through combination and/or mutation based on the parent individuals, with the end result being a set of individuals/rulesets that exhibit the best fitness are added to the production ruleset population, such that this procreation module performing crossover, mutation steps of a genetic algorithm on a set of individual rules containing conditional statements corresponds to a process that produces a set of class level rules (Hodjat Figure 6, [0088]-[0092]); and 
using at least a portion of the set of class level rules (Examiner’s note: In light of applicant’s specification paragraph [0018], this claim limitation is interpreted as occurring after producing the set of class level rules, in the use case where these rules are provided to a user as an explanation for further processing. As indicated earlier, Hodjat teaches a training portion of a system involving the procreation module being invoked for multiple iterations, where for each iteration, new individuals created by combination and/or mutation are placed in the pool of candidate individuals to be chosen as new parents for successive combinations and/or mutations, and undergo further fitness evaluations through the competition module (Hodjat Figure 6, elements 606, 116, 608 and [0090]-[0092]). Once the best individuals are identified, they are provided into a production portion of the system where they are used for determining a recommendation through a decision/action system, which outputs a recommendation for a human to perform an action (Hodjat [0070]).) …  
While Hodjat teaches probabilities associated with an output class, Hodjat does not explicitly teach
… a probability that each feature of the respective instance belongs to each of the output classes;
classifying … for which the probability that each feature of the respective instance belongs to the respective output class is highest; …
Sepahvand teaches
… a probability that each feature of the respective instance belongs to each of the output classes (Examiner’s note: As indicated earlier, Sepahvand teaches a Bayesian network modeling the conditional probabilities of variables in a rule, where each of the variables contain a plurality of classes and associated probability values associated with an output class, where the Bayesian network is used to identify the feature-associated chains in the network that have higher probabilities, that are useful for classification in order to understand existing events and predict future events (Sepahvand p.2 col.1 6th paragraph-col.2 2nd paragraph (Section III. Background, Section IV. Proposed Method) and p.2 col.2 Figure 1).);
classifying … for which the probability that each feature of the respective instance belongs to the respective output class is highest (Examiner’s note: As indicated earlier, Sepahvand teaches a Bayesian network is used to identify the feature-associated chains in the network that have higher probabilities, that are useful for classification in order to understand existing events and predict future events (Sepahvand p.2 col.1 6th paragraph-col.2 2nd paragraph (Section III. Background, Section IV. Proposed Method) and p.2 col.2 Figure 1).); …
Both Hodjat and Sepahvand are analogous art since both teach generating and identifying relevant rules using genetic algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take rule level probability taught in Hodjat and extend it through use of a Bayesian network to determine probabilities for respective features within the rule conditions taught in Sepahvand as a way to determine rule conditions containing the most probable features (i.e., those with the highest probability). The motivation to combine is taught in Sepahvand, as provided in the prior art claim mapping of Claim 1 recited above.
While Hodjat in view of Sepahvand teach the conditions in each rule entry (i.e., a set of instance level conditions) represented as feature/value pairs for the procreation module, Hodjat in view of Sepahvand does not explicitly teach 
… a set of instance level conditions each representing a presence or absence of each feature …
Fidelis teaches
… a set of instance level conditions each representing a presence or absence of each feature (Examiner’s note: As indicated earlier, Fidelis teaches encoding chromosome structures representing rule conditions for use in a genetic algorithm, where each gene represents a condition with attributes, and where each gene is represented by a weight field taking values in range [0..1], indicating whether or not the corresponding attribute is present according to a limit threshold, where if the weight field is below a threshold, the smaller the probability that the condition will be present, and hence the condition is effectively removed from the rule (corresponding to an absence) (Fidelis p.806 col.1 Figure 1 and Section 3.1 Individual Encoding 3rd paragraph: “… The field weight (Wi) is a real-valued variable taking values in the range [0..1]. This variable indicated whether or not the corresponding attribute is present in the rule. … the greater the value of the threshold Limit, the smaller the probability that the corresponding condition will be present in the rule … so that conditions with a weight smaller than or equal to 0.3 were effectively removed from the rule.”).) …
Both Hodjat in view of Sepahvand and Fidelis are analogous art since both teach predicting classification rules using machine learning algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the feature/value encoding representation for the evolutionary algorithm taught in Hodjat in view of Sepahvand and enhance with the encoding mechanism taught in Fidelis as a way to encode the set of instance level conditions for an evolutionary algorithm. The motivation to combine is taught in Fidelis, as provided in the prior art claim mapping in Claim 1 recited above.
While Hodjat in view of Sepahvand, in further view of Fidelis teaches using at least a portion of the set of class level rules to update the set of training data, Hodjat in view of Sepahvand, in further view of Fidelis does not explicitly teach
… using at least a portion of the set of class level rules to adjust hyper-parameters of the machine learning model and retrain the machine learning model using the adjusted hyper-parameters.
Chatterjee teaches
… using at least a portion of the set of class level rules to adjust hyper-parameters of the machine learning model and retrain the machine learning model using the adjusted hyper-parameters (Examiner’s note: In light of applicant’s specification paragraph [0018], this claim limitation is interpreted as occurring after producing the set of class level rules, in the use case context where these rules are provided to a user as an explanation for further processing. As indicated earlier, Chatterjee teaches that the explainer producing an explanatory rule set may provide the information to a client device, where a user at the client device can trigger re-generation of additional rules if the explanatory rule set is considered unsatisfactory according to a threshold (i.e., responses to observations which generated a “no explanation is available” message). Chatterjee further teaches this re-training trigger from the explainer corresponds to adding more internal representations to the input data in an exemplary machine learning model. In the context of the exemplary machine learning model being a neural network classifier (Chatterjee col.14 line 49-col.15 line 25; col.15 lines 57-63; col.16 lines 8-12), this re-training trigger to add more internal representations to the input data is interpreted adding more hidden layers in the exemplary neural network classifier to support the request to provide more explanation or ruleset conditions, and adjusting the neural network weights between layers to process the additional information, such that this addition of more internal representations represents changing the original dimensions of the neural network (i.e., corresponding to changing a neural network’s hyper-parameters) in order to re-train the exemplary machine learning model using the changed dimensions (Chatterjee Figure 9, elements 901, 925; col.18 lines 26-38; col.17 lines 32-50; and col.8 lines 43-65).).
Both Hodjat in view of Sepahvand, in further view of Fidelis and Chatterjee are analogous art since they both teach extracting predictive rulesets based on machine learning techniques.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the decision/action system recommending an action taught in Hodjat in view of Sepahvand, in further view of Fidelis and enhance it to include a re-training trigger taught in Chatterjee as a way to re-generate additional rules if the provided ruleset is considered as an unsatisfactory explanation or are insufficient to generate a recommendation. The motivation to combine is taught in Chatterjee, as provided in the prior art claim mapping in Claim 1 recited above.
While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee teaches applying instance level conditions to a genetic algorithm to produce a set of class level rules, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee does not explicitly teach 
… wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted class, wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the mutual information.
Huang teaches
… wherein producing the set of class level rules further comprises calculating, by the processor-based system, a mutual information representing a mutual dependence between the respective class level rule and the predicted class, wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the mutual information (Examiner’s note: Under its broadest reasonable interpretation in the context of the earlier recited limitations in this claim, the term “predicted class” broadly recites the particular output class that is being predicted by the respective instance conditions present in each class level rule. As indicated earlier, Huang teaches determining a subset S that maximizes the conditional mutual information I(C; S) given a set A with n features and a set C of all output classes, where the set S with k features represents a class level rule with different features (corresponding to conditions in the class level rule). This conditional mutual information I(C; S) calculation represents the mutual information between a subset of identified features and the predicted output classes from a classifier, which forms the basis for the I(C;                         
                            
                                
                                    f
                                
                                
                                    i
                                
                            
                            |
                            S
                        
                    ) value used in the hybrid genetic algorithm for searching a global optimal subset of features (Huang pp.1829-1831 Section 3. Feature ranking by conditional mutual information 1st paragraph; pp.1833-1834 Section 4.2 Local search for feature selection 2nd-5th paragraphs; and pp.1834-1835 Section 4.3 Implementation of the hybrid GA wrapper approach).).
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee and Huang are analogous art since they both teach applying a genetic algorithm to produce an set of features, using a metric to help search and identify a set of features in which to evolve during each genetic algorithm generation.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the fitness score calculation from the genetic algorithm taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee and replace it with the mutual information calculation taught in Huang as a way improve the search efficiency and performance of the genetic algorithm. The motivation to combine is taught in Huang, as provided in the prior art claim mapping of Claim 1 recited above.
Regarding amended Claim 10, 
Claim 10 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 3, and hence is rejected under similar rationale provided by Hodjat, Sepahvand, Fidelis, Chatterjee, and Huang as indicated in Claim 3, in view of the rejections of amended Claim 8.  
Regarding original Claim 12,
Claim 12 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 5, and hence is rejected under similar rationale provided by Hodjat, Sepahvand, Fidelis, Chatterjee, and Huang as indicated in Claim 5, in view of the rejections of amended Claim 8.  
Regarding amended Claim 15,
Claim 15 recites a system, where the system comprises claim limitations that are similar in scope to the corresponding claim limitations recited in amended Claims 1 and 8, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, and Huang as indicated in amended Claims 1 and 8. In addition, as indicated earlier, Hodjat teaches a computer system for implementing the system, where the computer system contains a processor subsystem containing one or more processors, and a storage subsystem that includes a memory subsystem and a file store subsystem (corresponding to one or more storages, Hodjat Figure 10, elements 1024, 1026, 1028; and [0116]), and a processor subsystem connected to the same internal bus subsystem as the storage subsystem (Hodjat Figure 10, elements 1012, 1014, 1024; and [0116]), where the memory subsystem contains computer instructions to operate or perform the training and production system functions (Hodjat Figure 10, elements 1014, 1024; and [0121]). In addition, Huang teaches the equation for determining mutual information between two variables X and Y, where these variables are later applied as an output class C and a set S of k features, resulting in the teaching of the following new limitation (“… wherein the mutual information is defined as                         
                            I
                            
                                
                                    X
                                    ;
                                    Y
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                                ∈
                                                X
                                            
                                        
                                        
                                            P
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                            l
                                            o
                                            g
                                            ⁡
                                            (
                                            
                                                
                                                    p
                                                    
                                                        
                                                            x
                                                            ,
                                                            y
                                                        
                                                    
                                                
                                                
                                                    p
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                    p
                                                    
                                                        
                                                            y
                                                        
                                                    
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     where I is the mutual information, X is a set including the respective class rule, Y is a set including the predicted class, p(x,y) is a joint probability function of X and Y, and p(x) and p(y) are marginal probability distribution functions of X and Y, respectively.”) (Huang pp.1827 Section 2.1 Entropy and mutual information: “… entropy and mutual information are introduced in Shannon’s information theory to measure the information of random variables … a discrete random variable X … with its probability density function denoted as p(x) … p(x,y) denotes the joint probability density function of X and Y … The common information of two random variables X and Y is defined as the mutual information between them,                         
                            I
                            
                                
                                    X
                                    ;
                                    Y
                                
                            
                            =
                            
                                
                                    ∑
                                    
                                        y
                                        ∈
                                        Y
                                    
                                
                                
                                    
                                        
                                            ∑
                                            
                                                x
                                                ∈
                                                X
                                            
                                        
                                        
                                            P
                                            
                                                
                                                    x
                                                    ,
                                                    y
                                                
                                            
                                            l
                                            o
                                            g
                                            ⁡
                                            (
                                            
                                                
                                                    P
                                                    
                                                        
                                                            x
                                                            ,
                                                            y
                                                        
                                                    
                                                
                                                
                                                    P
                                                    
                                                        
                                                            x
                                                        
                                                    
                                                    ∙
                                                    P
                                                    
                                                        
                                                            y
                                                        
                                                    
                                                
                                            
                                            )
                                        
                                    
                                
                            
                        
                     (equation 4).”; and p.1829 Section 3. Feature ranking by conditional mutual information 1st paragraph.).
Regarding amended Claim 17, 
Claim 17 recites the system of claim 15, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 3, and hence is rejected under similar rationale provided by Hodjat, Sepahvand, Fidelis, Chatterjee, and Huang as indicated in Claim 3, in view of the rejections of amended Claim 15.  
Regarding original Claim 19, 
Claim 19 recites the system of claim 15, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 5, and hence is rejected under similar rationale provided by Hodjat, Sepahvand, Fidelis, Chatterjee, and Huang as indicated in Claim 5, in view of the rejections of amended Claim 15.  
Claims 2, 6, 9, 13, 16, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hodjat et al., U.S. PGPUB 2017/0293849, published 10/12/2017 [hereafter referred as Hodjat] in view of Sepahvand et al., Generating Graphical Chain by Mutual Matching of Bayesian Network and Extracted Rules of Bayesian Network Using Genetic Algorithm, arXiv:1412.4465v1, December 15 2014 [hereafter referred as Sepahvand], in further view of Fidelis et al., Discovering Comprehensible Classification Rules with a Genetic Algorithm, Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (Cata.No.00th 8512), IEEE, July 16-19 2000 [hereafter referred as Fidelis], in even further view of Chatterjee et al., U.S. Patent 10,824,959, filed 2/16/2016 [hereafter referred as Chatterjee] in even further view of Huang et al., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28 (2007), 2007 Elsevier B.V [hereafter referred as Huang] as applied to Claims 1, 8, and 15; in even further view of Castellanos et al., U.S. PGPUB 2012/0089620, published 4/12/2012 [henceforth referred as Castellanos].
Regarding amended Claim 2, 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang as applied to Claim 1 teaches
(Currently amended) The method of claim 1, wherein producing the set of class level rules further comprises calculating, by the processor-based system, a fitness score for each class level rule (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, which are initialized with initial fitness estimates, and run through a battery of trials to test the training data, and updating the corresponding fitness estimates for each individual and ranking individuals based on their fitness score (Hodjat Figure 8; [0051] and Figure 6, elements 606, 116, 608; [0086]-[0089]).) …
… based on … precision of the respective class level rule and a coverage of the respective class level rule (Examiner’s note: Fidelis teaches a fitness score calculation comprising of sensitivity and specificity indicators, with the sensitivity indicator representing recall (corresponding to the coverage of the rule, which is expressed as a ratio of true positives over the sum of true positives and false negatives), and the specificity indicator representing precision (corresponding to the precision of the rule, which is expressed as a ratio of true positives over the sum of true positives and false positives) (Fidelis p.807 Section 3. Fitness Function: “The fitness function evaluates the quality of each rule (individual). … Our fitness function combines two indicators commonly used in medical domains, namely the sensitivity (Se) and the specificity (Sp), defined as follows: Se = tp/(tp + fn) … Sp = tn/(tn + fp). Finally, the fitness function used by our system is defined as the product of these two indicators, ie.,: fitness =Se*Sp.”).) … 
… wherein at least one generation of the genetic algorithm is configured to produce the set of class level rules using the fitness score (Examiner’s note: As indicated earlier, Hodjat teaches this pool of candidate individuals are provided as input into a procreation module where the procreation involves identifying parent individuals, performing mutation and crossover operations to create child individuals, and choosing the best ones based on a fitness estimate over multiple iterations to finally determine a set of individuals with the best fitness score at the end of the training session, where this set of individuals represent a set of class level rules. Hodjat further teaches the procreation module being invoked for multiple iterations, where for each iteration, new individuals created by combination and/or mutation are placed in the pool of candidate individuals to be chosen as new parents for successive combinations and/or mutations, and undergo further fitness evaluations through the competition module (Hodjat Figure 8; [0051] and Figure 6, elements 606, 116, 608; [0086]-[0092]).).  
	While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang teaches a fitness score based on a precision of the respective class level rule and a coverage of the respective class level rule, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang does not explicitly teach
… a fitness score … based on a harmonic mean …
Castellanos teaches
… a fitness score … based on a harmonic mean (Examiner’s note: Castellanos teaches using a F-measure calculation (“fitness score”) to validate extracted rules produced by a genetic algorithm, where the F-measure calculation is based on a harmonic mean corresponding to precision and recall (Castellanos [0028]; and [0043]-[0044]: “Rules learned during the training phase can be validated during a testing phase … The accuracy of each rule can be measured in terms of its “precision”, which can be defined as the number of correct extractions from all the extractions that it did. … validation may be performed using a metric termed "recall." "Recall" can be defined as the number of correct extractions done over the total number of extractions that may be performed in a validation test set. For example, if a validation test set was known to have ten expiration dates, but only five were extracted, the recall would be 5/20 or 0.5. Accordingly, an "accuracy" metric may be generated as a harmonic mean of precision and recall, herein termed an F measure. The F-measure may be calculated as: F = 2 ∙                        
                            
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                     
                                    ∙
                                     
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    +
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                            
                        
                    .”).) … 
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang and Castellanos are analogous art since both teach validating rules from a genetic algorithm using fitness score metrics based on precision and recall indicators.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the fitness score equation taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang with the fitness score equation containing a harmonic mean taught in Castellanos for validating the discovered rules produced by a genetic algorithm. Since Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang already teaches using a fitness score (comprising of precision and recall indicators) to evaluate the accuracy of the extracted rules and rank them, a person having ordinary skill in the art would also consider using a variation of the fitness score calculation (with the same precision and recall indicators) as taught in Castellanos for performing validation and ranking in order to produce the same predictable results.
Regarding previously presented Claim 6, 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang as applied to Claim 1 teaches
(Previously presented) The method of claim 1, further comprising 
… selecting, by the processor-based system, a subset of the set of class level rules by calculating at least one of a fitness score for each of a pair of the class level rules (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, which are initialized with initial fitness estimates, and run through a battery of trials to test the training data, and updating the corresponding fitness estimates for each individual and ranking individuals based on their fitness score (Hodjat Figure 8; [0051] and Figure 6, elements 606, 116, 608; [0086]-[0089]). Sepahvand teaches fitness scores are calculated for a pair of chromosomes (Sepahvand p.3 col.1 Algorithm 1 step (d) and p.3 col.1 Fitness Function 1st paragraph). Fidelis teaches each individual is represented by chromosomes (Fidelis p.807 col.1-col.1 Section 3.3 Fitness Function: “… The fitness function evaluates the quality of each rule (individual). … Each run of our GA solves a two-class classification problem … Therefore, the GA is run at least once for each class (value of the goal attribute). … When the GA is searching for rules predicting a given class, all other classes are effectively merged into a large class … Hence, the above formulas for Se and Sp can be applied to problems with any number of classes.”).) … and a mutual information between the respective class rule and the predicted class …
… a fitness score … based on … precision of the respective class level rule and a coverage of the respective class level rule (Examiner’s note: Fidelis teaches a fitness score calculation comprising of sensitivity and specificity indicators, with the sensitivity indicator representing recall (corresponding to the coverage of the rule, which is expressed as a ratio of true positives over the sum of true positives and false negatives), and the specificity indicator representing precision (corresponding to the precision of the rule, which is expressed as a ratio of true positives over the sum of true positives and false positives) (Fidelis p.807 Section 3. Fitness Function: “The fitness function evaluates the quality of each rule (individual). … Our fitness function combines two indicators commonly used in medical domains, namely the sensitivity (Se) and the specificity (Sp), defined as follows: Se = tp/(tp + fn) … Sp = tn/(tn + fp). Finally, the fitness function used by our system is defined as the product of these two indicators, ie.,: fitness =Se*Sp.”).) …
… selecting the class level rule having a greatest fitness score from the pair of class level rules using at least one of the fitness score (Examiner’s note: Fidelis teaches performing a series of runs for each class and using the fitness score to select the best rule, where the best rule is selected as the rule predicting that class (Fidelis p.808 col.1 Table 1 and p.808 col.1 Section 5.1 Results for the Dermatology Data Set 1st paragraph: “Table 1 presents the final 6 rules discovered by the GA – one rule for each class. For each class, the GA was run three times … The best rule of the three runs, according to its fitness values measured on the training set, was selected as the rule predicting that class (this is the rule shown in Table 1).”).) and the mutual information.  
	While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang teaches a fitness score based on a precision of the respective class level rule and a coverage of the respective class level rule, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang does not explicitly teach
… a fitness score … based on a harmonic mean …
Castellanos teaches
… a fitness score … based on a harmonic mean (Examiner’s note: Castellanos teaches using a F-measure calculation (“fitness score”) to validate extracted rules produced by a genetic algorithm, where the F-measure calculation is based on a harmonic mean corresponding to precision and recall (Castellanos [0028]; and [0043]-[0044]: “Rules learned during the training phase can be validated during a testing phase … The accuracy of each rule can be measured in terms of its “precision”, which can be defined as the number of correct extractions from all the extractions that it did. … validation may be performed using a metric termed "recall." "Recall" can be defined as the number of correct extractions done over the total number of extractions that may be performed in a validation test set. For example, if a validation test set was known to have ten expiration dates, but only five were extracted, the recall would be 5/20 or 0.5. Accordingly, an "accuracy" metric may be generated as a harmonic mean of precision and recall, herein termed an F measure. The F-measure may be calculated as: F = 2 ∙                        
                            
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                     
                                    ∙
                                     
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    +
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                            
                        
                    .”).) … 
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang and Castellanos are analogous art since both teach validating rules from a genetic algorithm using fitness score metrics based on precision and recall indicators.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the fitness score equation taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang with the fitness score equation containing a harmonic mean taught in Castellanos for validating the discovered rules produced by a genetic algorithm. Since Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang already teaches using a fitness score (comprising of precision and recall indicators) to evaluate the accuracy of the extracted rules and rank them, a person having ordinary skill in the art would also consider using a variation of the fitness score calculation (with the same precision and recall indicators) as taught in Castellanos for performing validation and ranking in order to produce the same predictable results.
Regarding amended Claim 9, 
Claim 9 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in amended Claim 2, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, and Castellanos as indicated in amended Claim 2, in view of the rejections of amended Claim 8.  
Regarding previously presented Claim 13, 
Claim 13 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in amended Claim 6, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, and Castellanos as indicated in amended Claim 6, in view of the rejections of amended Claim 8.  
Regarding amended Claim 16, 
Claim 16 recites the system of claim 15, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in amended Claim 2, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, and Castellanos as indicated in amended Claim 2, in view of the rejections of amended Claim 15.  
Regarding previously presented Claim 20, 
Claim 20 recites the system of claim 15, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in amended Claim 6, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang and Castellanos as indicated in amended Claim 6, in view of the rejections of amended Claim 15.  
Claims 4, 11, and 18 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hodjat et al., U.S. PGPUB 2017/0293849, published 10/12/2017 [hereafter referred as Hodjat] in view of Sepahvand et al., Generating Graphical Chain by Mutual Matching of Bayesian Network and Extracted Rules of Bayesian Network Using Genetic Algorithm, arXiv:1412.4465v1, December 15 2014 [hereafter referred as Sepahvand], in further view of Fidelis et al., Discovering Comprehensible Classification Rules with a Genetic Algorithm, Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (Cata.No.00th 8512), IEEE, July 16-19 2000 [hereafter referred as Fidelis], in even further view of Chatterjee et al., U.S. Patent 10,824,959, filed 2/16/2016 [hereafter referred as Chatterjee], in even further view of Huang et al., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28 (2007), 2007 Elsevier B.V. [hereafter referred as Huang] as applied to Claims 1, 8, and 15; in even further view of Kapila et al., A Genetic Algorithm with Entropy Based Initial Bias for Automated Rule Mining, Int'l Conf. on Computer & Communication Technology (ICCCT '10), IEEE 2010 [hereafter referred as Kapila].  
Regarding original Claim 4, 
 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang as applied to Claim 1 teaches
(Original) The method of claim 1, 
… wherein at least one of the features is a numerical feature (Examiner’s note: Hodjat teaches an example rule containing a plurality of conditions, where the attributes of the condition (e.g., pulse value at time t, blood pressure value at time t-1, blood pressure value at time t-6) correspond to numerical values (Hodjat [0062]-[0067]).) …
However, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang does not explicitly teach
… wherein the method further comprises pre-processing, by the processor-based system, the set of training data to convert the numerical feature into a categorical feature using entropy based binning.  
	Kapila teaches
… wherein the method further comprises pre-processing, by the processor-based system, the set of training data to convert the numerical feature into a categorical feature using entropy based binning (Examiner’s note: Kapila teaches an entropy based filter approach to determine the entropy contained in a numerical attribute, using a formula that determines the entropy of an attribute based on the number of classes, a weight factor related to a partition, and the expected information required to classify an example based on partition by an attribute, where the end result is to provide the predicting and goal attribute values as categorical values before encoding their corresponding rule conditions into chromosomes for input into a genetic algorithm (Kapila p.492 col.1 3rd paragraph-col.2 2nd paragraph (II.A. Population Initialization).).  
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang and Kapila are analogous art since both teach generating and identifying relevant rules using genetic algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date to take the numerical attribute values taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang and use the entropy based filter approach taught in Kapila as a way to pre-process numerical attributes into categorical attributes. The motivation to combine is taught in Kapila, as a way to bias the initial population of rules so that it contains relevant attributes with greater probability than redundant attributes, such that the training data contains relevant information, which then improves the search performance of the genetic algorithm by reducing the search space to discover rules with more predictive accuracy, effectively producing a more computationally efficient system (Kapila p.491 col.2 2nd-4th paragraphs: “… To enhance the performance of genetic algorithms for automated rule mining, relevant attributes must be selected to reduce the search space for GA. Selection of relevant attributes enhances the efficient as well as efficacy of genetic algorithms and discovers rules with higher predictive accuracy. … To address this problem it is important to bias the initial population so that it can have relevant attributes with greater probability than the redundant attributes. … This paper proposes a genetic algorithm approach for automated rule mining employing entropy based filter approach to bias the initial population towards more relevant or informative attributes so that the GA starts with better fit rules covering relatively more training instances. … the approach is anticipated to evolve better fit rules in lesser time, thereby significantly enhancing the performance of evolutionary rule mining process.”).
Regarding original Claim 11, 
Claim 11 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 4, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, and Kapila as indicated in Claim 4, in view of the rejections of amended Claim 8.  
Regarding original Claim 18, 
Claim 18 recites the system of claim 15, where the system further comprises claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 4, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, and Kapila as indicated in Claim 4, in view of the rejections of amended Claim 15.  
Claims 7 and 14 are rejected under 35 U.S.C. 103 as being unpatentable over 
Hodjat et al., U.S. PGPUB 2017/0293849, published 10/12/2017 [hereafter referred as Hodjat] in view of Sepahvand et al., Generating Graphical Chain by Mutual Matching of Bayesian Network and Extracted Rules of Bayesian Network Using Genetic Algorithm, arXiv:1412.4465v1, December 15 2014 [hereafter referred as Sepahvand], in further view of Fidelis et al., Discovering Comprehensible Classification Rules with a Genetic Algorithm, Proceedings of the 2000 Congress on Evolutionary Computation CEC00 (Cata.No.00th 8512), IEEE, July 16-19 2000 [hereafter referred as Fidelis], in even further view of Chatterjee et al., U.S. Patent 10,824,959, filed 2/16/2016 [hereafter referred as Chatterjee], in even further view of Huang et al., A hybrid genetic algorithm for feature selection wrapper based on mutual information, Pattern Recognition Letters 28 (2007), 2007 Elsevier B.V. [hereafter referred as Huang] as applied to Claims 1 and 8; in even further view of Castellanos et al., U.S. PGPUB 2012/0089620, published 4/12/2012 [henceforth referred as Castellanos], in even further view of Rivera, Wilson, Scalable Parallel Genetic Algorithms, Artificial Intelligence Review 16, Kluwer Academic Publishers, 2001 [hereafter referred as Rivera].
Regarding previously presented Claim 7, 
 
Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang as applied to Claim 1 teaches
(Previously presented) The method of claim 1, further comprising 
selecting, by the processor-based system, a subset of the set of class level rules by applying each of a pair of the class level rules to a … genetic algorithm (Examiner’s note: As indicated earlier, Hodjat teaches a procreation module performing an evolutionary (“genetic”) algorithm that involves identifying parent individuals, performing mutation and crossover operations to create child individuals, and choosing the best ones based on a fitness estimate over multiple iterations to finally determine a set of individuals with the best fitness score at the end of the training session, where this set of individuals represent a set of class level rules (Hodjat Figure 8; [0051] and Figure 6, elements 606, 116, 608; [0086]-[0087]; and [0090]-[0092]). Fidelis also teaches executing a genetic algorithm over multiple runs, with each run representing a search for rules representing each class, and determining a fitness function (comprising of Se and Sp) for each run to determine a set of class level rules (Fidelis p.807 col.1 5th paragraph-col.2, 1st paragraph (Section 3.3 Fitness Function): “Each run of our GA solves a two-class classification problem, where the goal is to predict whether or not the patient has a given disease. Therefore, the GA is run at least once for each class (value of the goal attribute). … In the first run the GA would search for rules predicting class 1; in the second run it would search for rules predicting class 2, and so on. When the GA is searching for rules predicting a given class, all other classes are effectively merged into a large class, which can be conceptually thought of as meaning that the patient does not have the disease predicted by the rule. Hence, the above formulas for Se and Sp can be applied to problems with any number of classes”).), 
wherein producing the set of class level rules further comprises calculating at least one of a fitness score for each class level rule (Examiner’s note: As indicated earlier, Hodjat teaches a system for evolving rulesets using an evolutionary algorithm, where the training portion of the system interacts with a database containing a pool of candidate individuals, which are initialized with initial fitness estimates, and run through a battery of trials to test the training data, and updating the corresponding fitness estimates for each individual and ranking individuals based on their fitness score (Hodjat Figure 8; [0051] and Figure 6, elements 606, 116, 608; [0086]-[0089]). Sepahvand teaches fitness scores are calculated for a pair of chromosomes (Sepahvand p.3 col.1 Algorithm 1 step (d) and p.3 col.1 Fitness Function 1st paragraph). Fidelis teaches each individual is represented by chromosomes (Fidelis p.807 col.1-col.1 Section 3.3 Fitness Function: “… The fitness function evaluates the quality of each rule (individual). … Each run of our GA solves a two-class classification problem … Therefore, the GA is run at least once for each class (value of the goal attribute). … When the GA is searching for rules predicting a given class, all other classes are effectively merged into a large class … Hence, the above formulas for Se and Sp can be applied to problems with any number of classes.”).) … and a mutual information between the respective class rule and the predicted class …
… a fitness score … based on … precision of the respective class level rule and a coverage of the respective class level rule (Examiner’s note: Fidelis teaches a fitness score calculation comprising of sensitivity and specificity indicators, with the sensitivity indicator representing recall (corresponding to the coverage of the rule, which is expressed as a ratio of true positives over the sum of true positives and false negatives), and the specificity indicator representing precision (corresponding to the precision of the rule, which is expressed as a ratio of true positives over the sum of true positives and false positives) (Fidelis p.807 Section 3. Fitness Function: “The fitness function evaluates the quality of each rule (individual). … Our fitness function combines two indicators commonly used in medical domains, namely the sensitivity (Se) and the specificity (Sp), defined as follows: Se = tp/(tp + fn) … Sp = tn/(tn + fp). Finally, the fitness function used by our system is defined as the product of these two indicators, ie.,: fitness =Se*Sp.”).) …
While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang teaches a fitness score based on a precision of the respective class level rule and a coverage of the respective class level rule, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang does not explicitly teach
… a fitness score … based on a harmonic mean …
Castellanos teaches
… a fitness score … based on a harmonic mean (Examiner’s note: Castellanos teaches using a F-measure calculation (“fitness score”) to validate extracted rules produced by a genetic algorithm, where the F-measure calculation is based on a harmonic mean corresponding to precision and recall (Castellanos [0028]; and [0043]-[0044]: “Rules learned during the training phase can be validated during a testing phase … The accuracy of each rule can be measured in terms of its “precision”, which can be defined as the number of correct extractions from all the extractions that it did. … validation may be performed using a metric termed "recall." "Recall" can be defined as the number of correct extractions done over the total number of extractions that may be performed in a validation test set. For example, if a validation test set was known to have ten expiration dates, but only five were extracted, the recall would be 5/20 or 0.5. Accordingly, an "accuracy" metric may be generated as a harmonic mean of precision and recall, herein termed an F measure. The F-measure may be calculated as: F = 2 ∙                        
                            
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                     
                                    ∙
                                     
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                                
                                    p
                                    r
                                    e
                                    c
                                    i
                                    s
                                    i
                                    o
                                    n
                                    +
                                    r
                                    e
                                    c
                                    a
                                    l
                                    l
                                
                            
                        
                    .”).) … 
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang and Castellanos are analogous art since both teach validating rules from a genetic algorithm using fitness score metrics based on precision and recall indicators.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to substitute the fitness score equation taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang with the fitness score equation containing a harmonic mean taught in Castellanos for validating the discovered rules produced by a genetic algorithm. Since Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang already teaches using a fitness score (comprising of precision and recall indicators) to evaluate the accuracy of the extracted rules and rank them, a person having ordinary skill in the art would also consider using a variation of the fitness score calculation (with the same precision and recall indicators) as taught in Castellanos for performing validation and ranking in order to produce the same predictable results.
While Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang, in even further view of Castellanos teaches selection of class level rules using a genetic algorithm, Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang, in even further view of Castellanos does not explicitly teach
… applying … to a second level genetic algorithm …
… wherein the second level genetic algorithm is configured to select the subset of class level rules using at least one of the fitness score and the predicted class. 
Rivera teaches
… applying … to a second level genetic algorithm (Examiner’s note: Under its broadest reasonable interpretation, the term “second level genetic algorithm” is interpreted as a genetic algorithm divided into multiple levels, for purposes such as parallelization. Rivera teaches a global parallelization scheme where the evaluation of an individual’s fitness values is parallelized by assigning a fraction of the individual population to each processor to be evaluated, where a master processor performs the genetic operators and distributes the individuals among a set of slave processors to evaluate the fitness values, where the fitness values measures the quality of each individual, thus identifying the individuals for successive reproduction by the genetic algorithm (Rivera p.154 Section 2.1 Genetic algorithms 1st-2nd paragraphs; p.155 Section 2.2 Parallelization strategies bullet 1 and p.156 Figure 1). Rivera further teaches that this global parallelization scheme can be combined into a hybrid model and implemented in using various libraries such as PGAPack, which allows for multiple levels of control for the genetic algorithm (Rivera p.157 bullet 4).) …
… wherein the second level genetic algorithm is configured to select the subset of class level rules using at least one of the fitness score (Examiner’s note: As indicated earlier, Rivera teaches a global parallelization scheme, where a master processor performs the genetic operators and distributes the individuals among a set of slave processors to evaluate the fitness values, where the fitness values measures the quality of each individual, thus identifying the individuals for successive reproduction by the genetic algorithm (Rivera p.154 Section 2.1 Genetic algorithms 1st-2nd paragraphs; p.155 Section 2.2 Parallelization strategies bullet 1 and p.156 Figure 1).) and the predicted class. 
Both Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang, in even further view of Castellanos and Rivera are analogous art since they both teach genetic algorithms.
It would have been obvious to a person having ordinary skill in the art before the effective filing date of the invention to take the genetic algorithm taught in Hodjat in view of Sepahvand, in further view of Fidelis, in even further view of Chatterjee, in further view of Huang, in even further view of Castellanos and implement the master/slave type parallelization scheme taught in Rivera as a way to parallelize the genetic algorithm into different levels. The motivation to combine is taught in Rivera, since global parallelization can preserve the behavior of the original genetic algorithm and is effective in performing complicated fitness evaluations, which results in improvements to the computational time for the genetic algorithm (Rivera p.155 Section 2.2 Parallelization schemes, bullet 1).
Regarding previously presented Claim 14, 
Claim 14 recites the computer program product of claim 8, where the computer program product further comprises instructions that when executed by one or more computer processors cause the one or more processors to perform a process that includes claim limitations that are similar in scope to the corresponding claim limitations recited in Claim 7, and hence is rejected under similar rationale and motivations provided by Hodjat, Sepahvand, Fidelis, Chatterjee, Huang, Castellanos, and Rivera as indicated in Claim 7, in view of the rejections of amended Claim 8.  

Conclusion
The prior art made of record and not relied upon is considered pertinent to applicant’s disclosure.
Sivakumar et al., Feature Selection Using Genetic Algorithm with Mutual Information, IJCSIT Vol.5 (3), 2014, pp.2871-2874, where Sivakumar teaches using mutual information to evaluate a set of features F and class C to perform help perform feature selection in a genetic algorithm (Sivakumar p.2871 Section II. Mutual Information (MI) and p.2872 Section III. Genetic Algorithm (GA) and Figure 1).
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action. Accordingly, THIS ACTION IS MADE FINAL. See MPEP § 706.07(a). Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action. In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to WILLIAM WAI YIN KWAN whose telephone number is 303-297-4332. The examiner can normally be reached Monday-Friday 8:00am - 4:30pm PT.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 571-272-3768. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.

/WILLIAM WAI YIN KWAN/Examiner, Art Unit 2121                                                                                                                                                                                                        

/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121