DETAILED ACTION
This action is in response to the claims filed 08/29/2022 for application 16/566,375. Claims 1, 7, and 13 have been amended and claims 2, 8 and 14 have been canceled. Thus, claims 1, 3, 5, 7, 9 11, 13, 15, and 17 are currently pending.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Continued Examination Under 37 CFR 1.114
A request for continued examination under 37 CFR 1.114, including the fee set forth in 37 CFR 1.17(e), was filed in this application after final rejection.  Since this application is eligible for continued examination under 37 CFR 1.114, and the fee set forth in 37 CFR 1.17(e) has been timely paid, the finality of the previous Office action has been withdrawn pursuant to 37 CFR 1.114.  Applicant's submission filed on 08/29/2022 has been entered.
Information Disclosure Statement
The information disclosure statement (IDS) submitted on 07/14/2022 is in compliance with the provisions of 37 CFR 1.97.  Accordingly, the information disclosure statement is being considered by the examiner.

Claim Rejections - 35 USC § 101
35 U.S.C. 101 reads as follows:
Whoever invents or discovers any new and useful process, machine, manufacture, or composition of matter, or any new and useful improvement thereof, may obtain a patent therefor, subject to the conditions and requirements of this title.
Claims 1, 3, 5, 7, 9 11, 13, 15, and 17 are rejected under 35 U.S.C. 101 because the claimed invention is directed to an abstract idea without significantly more.

Regarding claim 1, 
Step 1 Analysis: Claim 1 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 1 recites, in part, generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction. The limitations of generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind or pen and paper, then it falls within the “Mental Processes” grouping of abstract ideas. 
The limitations of:
generating a hypothesis set which includes a combination of literals of the explanatory variables can be considered to be an evaluation in the human mind. 
a combination determined to classify a set of training data into the second piece of data…can be considered to be an evaluation in the human mind, 
generating a prediction result of input data can be considered to be an evaluation in the human, and 
outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction can be considered to be an evaluation in the human mind.
Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim only recites the additional elements – “non-transitory computer-readable recording medium” and “computer”. The “non-transitory computer-readable recording medium” and “computer” in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim further recites: performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data. This limitation is an insignificant extra solution activity and thus the judicial exception is not integrated into a practical application. The claim also recites: a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in Step 2A Prong 1. The claim as a whole is directed to an abstract idea. 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a non-transitory computer-readable recording medium and computer amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Additionally, the limitation of performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data is well-understood, routine, and conventional, as evidenced by Ruckert et al. ("A Statistical Approach to Rule Learning" cited in the IDS filed 03/26/2020, pg. 785, § 1. Introduction, right col, ¶1). This limitation therefore remains insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, these additional elements are only generally linked to the exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 3, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 5, the rejection of claim 1 is further incorporated, and further, the claim recites: wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 1 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 7, 
Step 1 Analysis: Claim 7 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 7 recites, in part, generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction. The limitations of generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind or pen and paper, then it falls within the “Mental Processes” grouping of abstract ideas. 
The limitations of:
generating a hypothesis set which includes a combination of literals of the explanatory variables can be considered to be an evaluation in the human mind. 
a combination determined to classify a set of training data into the second piece of data…can be considered to be an evaluation in the human mind, 
generating a prediction result of input data can be considered to be an evaluation in the human, and 
outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction can be considered to be an evaluation in the human mind.
Accordingly, the claim recites an abstract idea.
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional element –  “computer”. The “computer” in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim further recites: performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data. This limitation is an insignificant extra solution activity and thus the judicial exception is not integrated into a practical application. The claim also recites: a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in Step 2A Prong 1. The claim as a whole is directed to an abstract idea. 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a computer amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Additionally, the limitation of performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data is well-understood, routine, and conventional, as evidenced by Ruckert et al. ("A Statistical Approach to Rule Learning" cited in the IDS filed 03/26/2020, pg. 785, § 1. Introduction, right col, ¶1). This limitation therefore remains insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, these additional elements are only generally linked to the exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.

Regarding claim 9, the rejection of claim 7 is further incorporated, and further, the claim recites: wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 7 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 11, the rejection of claim 7 is further incorporated, and further, the claim recites: wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 7 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 13, 
Step 1 Analysis: Claim 13 is directed to a process, which falls within one of the four statutory categories. 
Step 2A Prong 1 Analysis: Claim 13 recites, in part generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction. The limitations of generating a hypothesis set which includes a combination of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data…, generating a prediction result of input data, and outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction, as drafted, are processes that, under broadest reasonable interpretation, covers performance of the limitation in the mind. If a claim limitation, under its broadest reasonable interpretation, covers performance of the limitation in the mind or pen and paper, then it falls within the “Mental Processes” grouping of abstract ideas. 
The limitations of:
generating a hypothesis set which includes a combination of literals of the explanatory variables can be considered to be an evaluation in the human mind. 
a combination determined to classify a set of training data into the second piece of data…can be considered to be an evaluation in the human mind, 
generating a prediction result of input data can be considered to be an evaluation in the human, and 
outputting the prediction results including a prediction score acquired from the weight and indicating accuracy of the prediction can be considered to be an evaluation in the human mind.
Accordingly, the claim recites an abstract idea. 
Step 2A Prong 2 Analysis: This judicial exception is not integrated into a practical application. In particular, the claim recites the additional elements –  “a memory” and “processor”. The “memory” and “processor” in the claim are recited at a high level of generality (i.e. as a generic processor performing a generic computer function of generating an index) such that it amounts to no more than mere instructions to apply the exception using a generic computer component. Accordingly, these additional elements do not integrate the abstract idea into a practical application because they do not impose any meaningful limits on practicing the abstract idea.
The claim further recites: performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data. This limitation is an insignificant extra solution activity and thus the judicial exception is not integrated into a practical application. The claim also recites: a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in Step 2A Prong 1. The claim as a whole is directed to an abstract idea. The claim as a whole is directed to an abstract idea. 
Step 2B Analysis: The claims do not include additional elements that are sufficient to amount to significantly more than the judicial exception. As discussed above with respect to integration of the abstract idea into a practical application, the additional elements of utilizing a memory and processor amount to no more than mere instructions to apply the exception using a generic computer component. Mere instructions to apply an exception using a generic computer component cannot provide an inventive concept. Additionally, the limitation of performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data is well-understood, routine, and conventional, as evidenced by Ruckert et al. ("A Statistical Approach to Rule Learning" cited in the IDS filed 03/26/2020, pg. 785, § 1. Introduction, right col, ¶1). This limitation therefore remains insignificant extra-solution activity even upon reconsideration, and does not amount to significantly more. Even when considered in combination, these additional elements are only generally linked to the exception and insignificant extra-solution activity, which cannot provide an inventive concept. The claim is not patent eligible.
Regarding claim 15, the rejection of claim 13 is further incorporated, and further, the claim recites: wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 13 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.

Regarding claim 17, the rejection of claim 13 is further incorporated, and further, the claim recites: wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set. This limitation amounts to more specifics of the judicial exception identified in the rejection of claim 13 above.
The claim does not include any additional elements that amount to an integration of the judicial exception into a practical application, nor to significantly more than the judicial exception. The claim is not patent eligible.






Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

The factual inquiries for establishing a background for determining obviousness under 35 U.S.C. 103 are summarized as follows:
1. Determining the scope and contents of the prior art.
2. Ascertaining the differences between the prior art and the claims at issue.
3. Resolving the level of ordinary skill in the pertinent art.
4. Considering objective evidence present in the application indicating obviousness or nonobviousness.
Claims 1, 3, 7, 9, 13, and 15 are rejected under 35 U.S.C. 103 as being unpatentable over Ruckert et al. ("A Statistical Approach to Rule Learning" cited in the IDS filed 03/26/2020, hereinafter "Ruckert") in view of Yin et al. ("CPAR: Classification based on Predictive Association Rules", hereinafter "Yin").



Regarding claim 1, Ruckert teaches A non-transitory computer-readable recording medium having stored therein a program that causes a computer to execute a process (“In order to test the performance of the proposed system on real world data, we implemented a version in C++ on Linux.” [pg. 791, § 4. Experiment, ¶1, use of computer and memory is implicit]) comprising: 
generating, using sets of training data each of which includes first pieces of data corresponding to explanatory variables and a second piece of data corresponding to an objective variable (“First of all, we follow the usual convention and assume that the instances are drawn i.i.d. according to a fixed but unknown distribution D. D ranges over X × Y, where X is the set of all possible instances and Y := {−1, 1} contain the target labels.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; X would include explanatory variables and target label would correspond to an objective variable.]), a hypothesis set including hypotheses (“Furthermore, we assume we already have a (possibly infinite) reservoir of rules R = {r1, r2, . . .}, where a rule rj : X → [−1, 1] assigns a value between -1 and 1 to each instance.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; A rule would be an equivalent to hypothesis) and a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value (“An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; Examiner is interpreting number of literals to be equivalent to instances (xi := (xi(1), xi(2), . . . , xi(n))T).]); 
performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data (“An outline is given in Algorithm 1. The system starts with the training set given in a set X and an empty set of rules. In the main loop, it adds a new rule to the set of rules and applies the rules to the examples in the training set. It determines the weight vector p(i) optimizing the soft margin loss, the empirical margin or the empirical MMV on this data and calculates an upper bound on the corresponding true values given the empirical quantity and the number of rules. If there are no new rules available, the algorithm terminates the loop and returns the set of rules and the weight vector which achieved the best bound.” [pg. 790, § 3. A Statistical Rule Learning System, top left col; See further “If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n. [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]]); 
generating a prediction result of input data that is received for prediction, using a hypothesis specified from the hypothesis set matching the input data and the weight of the specified hypothesis (“If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n… Also note that the weight vector defines a hyperplane separating [−1; 1]n into two half-spaces so that rule sets in our setting are related to linear classifiers and perceptrons.” [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]); and 
outputting the prediction result including a prediction score acquired from the weight and indicating accuracy of the prediction, and the specified hypothesis (“The main goal of rule induction is to find a preferably small set of (weighted) rules with high predictive accuracy. In the context of statistical learning, this setting is usually modeled as follows: The user draws an i.i.d. sample of labeled examples from a fixed, but unknown distribution. The user is interested in identifying the hypothesis in a certain hypothesis class, which most accurately predicts the label from the instance.” [pg. 785, § 1 Introduction, ¶3; See further Table 1: “percentage of correct classifications” corresponds to prediction scores [pg. 791, top left col]]).
However, Ruckert fails to explicitly teach each of which includes a combination of literals of the explanatory variables, wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses
Yin teaches a hypothesis set including hypotheses each of which includes a combination of literals of the explanatory variables (“
    PNG
    media_image1.png
    148
    412
    media_image1.png
    Greyscale
” [pg. 332, Definition 2.2]), wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses (“Two important association rule-based classifiers are CBA and CMAR. CBA first generates all the association rules with certain support and confidence thresholds as candidate rules. It then selects a small set of rules from them to form a classifier. When predicting the class label for an example, the best rule (i.e., with the highest confidence) whose body is satisfied by the example is chosen for prediction.” [pg. 332, left col, para under Definition 2.2.; See further Algorithm 3.1 includes a Training data set D as the input.])
Ruckert and Yin are both in the same field of endeavor of rule learning and thus are analogous. Ruckert teaches discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s teachings by including use a conjunction of literals to classify a set of training data into a target class as taught by Yin. One would have been motivated to make this modification to use the best hypotheses to improve the prediction accuracy of the algorithm [Abstract, Yin]

Regarding claim 3, Ruckert/Yin teaches The non-transitory computer-readable recording medium according to claim 1, where Ruckert teaches wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value (“This rule set would classify the instance (sunny, normal, cold, change) as positive, because it meets the conditions of the first and the second rule, so the sum of weights is 0.05, which is greater than zero.” [pg. 385, § 1. Introduction, right col, ¶2; See further: “An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1]]).

Regarding claim 7, Ruckert teaches A learning method executed by a computer, (“In order to test the performance of the proposed system on real world data, we implemented a version in C++ on Linux.” [pg. 791, § 4. Experiment, ¶1, use of computer and memory is implicit]), the learning method comprising: 
generating, using sets of training data each of which includes first pieces of data corresponding to explanatory variables and a second piece of data corresponding to an objective variable (“First of all, we follow the usual convention and assume that the instances are drawn i.i.d. according to a fixed but unknown distribution D. D ranges over X × Y, where X is the set of all possible instances and Y := {−1, 1} contain the target labels.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; X would include explanatory variables and target label would correspond to an objective variable.]), a hypothesis set including hypotheses (“Furthermore, we assume we already have a (possibly infinite) reservoir of rules R = {r1, r2, . . .}, where a rule rj : X → [−1, 1] assigns a value between -1 and 1 to each instance.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; A rule would be an equivalent to hypothesis) and a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value (“An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; Examiner is interpreting number of literals to be equivalent to instances (xi := (xi(1), xi(2), . . . , xi(n))T).]); 
performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data (“An outline is given in Algorithm 1. The system starts with the training set given in a set X and an empty set of rules. In the main loop, it adds a new rule to the set of rules and applies the rules to the examples in the training set. It determines the weight vector p(i) optimizing the soft margin loss, the empirical margin or the empirical MMV on this data and calculates an upper bound on the corresponding true values given the empirical quantity and the number of rules. If there are no new rules available, the algorithm terminates the loop and returns the set of rules and the weight vector which achieved the best bound.” [pg. 790, § 3. A Statistical Rule Learning System, top left col; See further “If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n. [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]]); 
generating a prediction result of input data that is received for prediction, using a hypothesis specified from the hypothesis set matching the input data and the weight of the specified hypothesis (“If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n… Also note that the weight vector defines a hyperplane separating [−1; 1]n into two half-spaces so that rule sets in our setting are related to linear classifiers and perceptrons.” [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]); and 
outputting the prediction result including a prediction score acquired from the weight and indicating accuracy of the prediction, and the specified hypothesis (“The main goal of rule induction is to find a preferably small set of (weighted) rules with high predictive accuracy. In the context of statistical learning, this setting is usually modeled as follows: The user draws an i.i.d. sample of labeled examples from a fixed, but unknown distribution. The user is interested in identifying the hypothesis in a certain hypothesis class, which most accurately predicts the label from the instance.” [pg. 785, § 1 Introduction, ¶3; See further Table 1: “percentage of correct classifications” corresponds to prediction scores [pg. 791, top left col]]).
However, Ruckert fails to explicitly teach each of which includes a combination of literals of the explanatory variables, wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses
Yin teaches a hypothesis set including hypotheses each of which includes a combination of literals of the explanatory variables (“
    PNG
    media_image1.png
    148
    412
    media_image1.png
    Greyscale
” [pg. 332, Definition 2.2]), wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses (“Two important association rule-based classifiers are CBA and CMAR. CBA first generates all the association rules with certain support and confidence thresholds as candidate rules. It then selects a small set of rules from them to form a classifier. When predicting the class label for an example, the best rule (i.e., with the highest confidence) whose body is satisfied by the example is chosen for prediction.” [pg. 332, left col, para under Definition 2.2.; See further Algorithm 3.1 includes a Training data set D as the input.])
Ruckert and Yin are both in the same field of endeavor of rule learning and thus are analogous. Ruckert teaches discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s teachings by including use a conjunction of literals to classify a set of training data into a target class as taught by Yin. One would have been motivated to make this modification to use the best hypotheses to improve the prediction accuracy of the algorithm [Abstract, Yin]

Regarding claim 9, Ruckert/Yin teaches The learning method according to claim 7, where Ruckert teaches wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value (“This rule set would classify the instance (sunny, normal, cold, change) as positive, because it meets the conditions of the first and the second rule, so the sum of weights is 0.05, which is greater than zero.” [pg. 385, § 1. Introduction, right col, ¶2; See further: “An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1]]).

Regarding claim 13, Ruckert teaches A learning apparatus comprising a memory; and a processor coupled to the memory and the processor configured to executes a process, (“In order to test the performance of the proposed system on real world data, we implemented a version in C++ on Linux.” [pg. 791, § 4. Experiment, ¶1, use of computer and memory is implicit]), the process comprising: 
generating, using sets of training data each of which includes first pieces of data corresponding to explanatory variables and a second piece of data corresponding to an objective variable (“First of all, we follow the usual convention and assume that the instances are drawn i.i.d. according to a fixed but unknown distribution D. D ranges over X × Y, where X is the set of all possible instances and Y := {−1, 1} contain the target labels.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; X would include explanatory variables and target label would correspond to an objective variable.]), a hypothesis set including hypotheses (“Furthermore, we assume we already have a (possibly infinite) reservoir of rules R = {r1, r2, . . .}, where a rule rj : X → [−1, 1] assigns a value between -1 and 1 to each instance.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; A rule would be an equivalent to hypothesis) and a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value (“An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1; Examiner is interpreting number of literals to be equivalent to instances (xi := (xi(1), xi(2), . . . , xi(n))T).]); 
performing a machine learning process to calculate a weight of each of the hypotheses included in the hypothesis set on a basis of the second piece of data classified by each of the hypotheses using each of the sets of training data (“An outline is given in Algorithm 1. The system starts with the training set given in a set X and an empty set of rules. In the main loop, it adds a new rule to the set of rules and applies the rules to the examples in the training set. It determines the weight vector p(i) optimizing the soft margin loss, the empirical margin or the empirical MMV on this data and calculates an upper bound on the corresponding true values given the empirical quantity and the number of rules. If there are no new rules available, the algorithm terminates the loop and returns the set of rules and the weight vector which achieved the best bound.” [pg. 790, § 3. A Statistical Rule Learning System, top left col; See further “If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n. [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]]); 
generating a prediction result of input data that is received for prediction, using a hypothesis specified from the hypothesis set matching the input data and the weight of the specified hypothesis (“If we consider only the first n rules, we can represent the ith instance by the vector of rule values xi := (xi(1), xi(2), . . . , xi(n))T . Likewise, a weighted rule set can be given by a weight vector p ∈ [−1, 1]n… Also note that the weight vector defines a hyperplane separating [−1; 1]n into two half-spaces so that rule sets in our setting are related to linear classifiers and perceptrons.” [pg. 787, § 2.2 Learning Weighted Rule sets, ¶1]); and 
outputting the prediction result including a prediction score acquired from the weight and indicating accuracy of the prediction, and the specified hypothesis (“The main goal of rule induction is to find a preferably small set of (weighted) rules with high predictive accuracy. In the context of statistical learning, this setting is usually modeled as follows: The user draws an i.i.d. sample of labeled examples from a fixed, but unknown distribution. The user is interested in identifying the hypothesis in a certain hypothesis class, which most accurately predicts the label from the instance.” [pg. 785, § 1 Introduction, ¶3; See further Table 1: “percentage of correct classifications” corresponds to prediction scores [pg. 791, top left col]]).
However, Ruckert fails to explicitly teach each of which includes a combination of literals of the explanatory variables, wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses
Yin teaches a hypothesis set including hypotheses each of which includes a combination of literals of the explanatory variables (“
    PNG
    media_image1.png
    148
    412
    media_image1.png
    Greyscale
” [pg. 332, Definition 2.2]), wherein from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses (“Two important association rule-based classifiers are CBA and CMAR. CBA first generates all the association rules with certain support and confidence thresholds as candidate rules. It then selects a small set of rules from them to form a classifier. When predicting the class label for an example, the best rule (i.e., with the highest confidence) whose body is satisfied by the example is chosen for prediction.” [pg. 332, left col, para under Definition 2.2.; See further Algorithm 3.1 includes a Training data set D as the input.])
Ruckert and Yin are both in the same field of endeavor of rule learning and thus are analogous. Ruckert teaches discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s teachings by including use a conjunction of literals to classify a set of training data into a target class as taught by Yin. One would have been motivated to make this modification to use the best hypotheses to improve the prediction accuracy of the algorithm [Abstract, Yin]

Regarding claim 15, Ruckert/Yin teaches The learning apparatus according to claim 13, where Ruckert teaches wherein a number or a ratio of sets of training data classified into second piece of data by each of the hypotheses is equal to or a larger than a predetermined value (“This rule set would classify the instance (sunny, normal, cold, change) as positive, because it meets the conditions of the first and the second rule, so the sum of weights is 0.05, which is greater than zero.” [pg. 385, § 1. Introduction, right col, ¶2; See further: “An individual rule set p assigns class label sgn(pT xi), so that instance xi is positive, if                         
                            
                                
                                    ∑
                                    
                                        j
                                        =
                                        1
                                    
                                    
                                        n
                                    
                                
                                
                                    
                                        
                                            p
                                        
                                        
                                            j
                                             
                                        
                                    
                                
                            
                            
                                
                                    x
                                
                                
                                    i
                                
                            
                            (
                            j
                            )
                        
                     ≥ 0 and negative otherwise.” [pg. 787, § 2.2 Learning Weighted Rule Sets, ¶1]]).

Claims 5, 11, and 17 are rejected under 35 U.S.C. 103 as being unpatentable over Ruckert in view of Yin and further in in view of Lavrac et al. ("Explicit Feature Construction and Manipulation for Covering Rule Learning Algorithms", hereinafter "Lavrac").

Regarding claim 5, Ruckert/Yin teaches The non-transitory computer-readable recording medium according to claim 1, however fails to explicitly teach wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set
Lavrac teaches wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set (“The first clause expresses that trains which have a short closed car are going East. The second clause states that trains which have a short car and a closed car are going East. The second clause is clearly more general than the first, covering all instances that the first one covers, and in addition instances where the short car is different from the closed car. We say that the body of the first clause consists of a single relational feature, while the body of the second clause contains two distinct features. Formally, a feature is defined as a minimal set of literals such that no local (i.e., existential) variable occurs both inside and outside that set. The main point of relational features is that they localize variable sharing: the only variable which is shared among features is the global variable occurring in the rule head. This can be made explicit by naming the features:” [pg. 129, ¶1-2; See further “The relevancy concept and a possibility to detect and eliminate some features as irrelevant even before entering the rule construction process is important also for other reasons. The first is that it enables complexity reduction of the rule construction task. The second is that the elimination of irrelevant features is useful for overfitting avoidance by reducing the search space of hypotheses through the elimination of features and their combinations with low covering properties” [pg. 133, § 4.2 Concept of Feature Relevancy]])
Ruckert, Yin and Lavrac are all in the same field of endeavor of rule learning and thus are analogous. Ruckert discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. Lavrac discloses feature construction and manipulation for rule-based learning algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s/Yin’s rule-based algorithms with the feature manipulation method disclosed by Lavrac. One would have been motivated to make this modification to minimize the number of literals in order to allow the hypothesis to have more coverage of the instances. [pg. 129, ¶1, Lavrac]

Regarding claim 11, Ruckert/Yin teaches The learning method according to claim 7, however fails to explicitly teach wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set
Lavrac teaches wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set (“The first clause expresses that trains which have a short closed car are going East. The second clause states that trains which have a short car and a closed car are going East. The second clause is clearly more general than the first, covering all instances that the first one covers, and in addition instances where the short car is different from the closed car. We say that the body of the first clause consists of a single relational feature, while the body of the second clause contains two distinct features. Formally, a feature is defined as a minimal set of literals such that no local (i.e., existential) variable occurs both inside and outside that set. The main point of relational features is that they localize variable sharing: the only variable which is shared among features is the global variable occurring in the rule head. This can be made explicit by naming the features:” [pg. 129, ¶1-2; See further “The relevancy concept and a possibility to detect and eliminate some features as irrelevant even before entering the rule construction process is important also for other reasons. The first is that it enables complexity reduction of the rule construction task. The second is that the elimination of irrelevant features is useful for overfitting avoidance by reducing the search space of hypotheses through the elimination of features and their combinations with low covering properties” [pg. 133, § 4.2 Concept of Feature Relevancy]])
Ruckert, Yin and Lavrac are all in the same field of endeavor of rule learning and thus are analogous. Ruckert discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. Lavrac discloses feature construction and manipulation for rule-based learning algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s/Yin’s rule-based algorithms with the feature manipulation method disclosed by Lavrac. One would have been motivated to make this modification to minimize the number of literals in order to allow the hypothesis to have more coverage of the instances. [pg. 129, ¶1, Lavrac]

Regarding claim 17, Ruckert/Yin teaches The learning apparatus according to claim 13, however fails to explicitly teach wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set
Lavrac teaches wherein in the generating when a first hypothesis including a first literal classifies a set of training data into the second piece of data and a second hypothesis including the first literal and a second literal classifies a set of training data into the second piece of data that is same as by the first hypothesis, the second hypothesis is excluded from the hypothesis set (“The first clause expresses that trains which have a short closed car are going East. The second clause states that trains which have a short car and a closed car are going East. The second clause is clearly more general than the first, covering all instances that the first one covers, and in addition instances where the short car is different from the closed car. We say that the body of the first clause consists of a single relational feature, while the body of the second clause contains two distinct features. Formally, a feature is defined as a minimal set of literals such that no local (i.e., existential) variable occurs both inside and outside that set. The main point of relational features is that they localize variable sharing: the only variable which is shared among features is the global variable occurring in the rule head. This can be made explicit by naming the features:” [pg. 129, ¶1-2; See further “The relevancy concept and a possibility to detect and eliminate some features as irrelevant even before entering the rule construction process is important also for other reasons. The first is that it enables complexity reduction of the rule construction task. The second is that the elimination of irrelevant features is useful for overfitting avoidance by reducing the search space of hypotheses through the elimination of features and their combinations with low covering properties” [pg. 133, § 4.2 Concept of Feature Relevancy]])
Ruckert, Yin and Lavrac are all in the same field of endeavor of rule learning and thus are analogous. Ruckert discloses a rule-based learning algorithm for classification. Yin discloses classification based on predictive rule sets. Lavrac discloses feature construction and manipulation for rule-based learning algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to modify Ruckert’s/Yin’s rule-based algorithms with the feature manipulation method disclosed by Lavrac. One would have been motivated to make this modification to minimize the number of literals in order to allow the hypothesis to have more coverage of the instances. [pg. 129, ¶1, Lavrac]


Response to Arguments
Applicant's arguments filed 08/29/2022 have been fully considered but they are not persuasive. 

Regarding the 35 U.S.C. 101 Rejection:
Applicant appears to argue on pg. 6 that the “hypothesis set” generated includes “a combination of literals of the explanatory variables”, meaning “"from among all of possible combinations of literals of the explanatory variables, a combination determined to classify a set of training data into the second piece of data is included in the hypothesis set as one of the hypotheses, and a number of literals included in each of the hypotheses is equal to or smaller than a predetermined value" enhances the learning process. Examiner respectfully disagrees. The limitations of generating a hypothesis set, used to classify a set of training data, generating a prediction result, and outputting the prediction result are all steps which can be practically performed in the human mind under the broadest reasonable interpretation. The claims as currently recited are still directed to an abstract idea.

Applicant appears to argue that the subject matter of claim 1 shows an improvement to the machine learning algorithm. Examiner respectfully disagrees. The claims do not currently recite language which shows an improvement to the training or learning of the algorithm nor the improvement of the functioning of a computer or processor. The language in the claims appear to be directed towards showing an improvement to an abstract idea (i.e. improving predicting). Improvements to an abstract idea are still considered to be an abstract idea. Therefore, applicant’s arguments are not persuasive. Please see the updated 101 rejection above.


Regarding the 35 U.S.C. 103 Rejection:
Applicant’s arguments regarding the prior arts of Ruckert1/Ruckert2 failing to teach “each of which includes a combination of literals of the explanatory variables, the combination being included in all possible combination of literals of the explanatory variables” has been considered but are moot because the newly amended limitation is now taught by the newly presented art of Yin. Please see the updated 103 rejection above.

Applicant’s arguments with respect to the rejections of the dependent claims have been fully considered but they are not persuasive as they rely upon the allowability of the independent claims.

Conclusion
Any inquiry concerning this communication or earlier communications from the examiner should be directed to MICHAEL H HOANG whose telephone number is (571)272-8491. The examiner can normally be reached Mon-Fri 8:30AM-4:30PM.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Kakali Chaki can be reached on (571) 272-3719. The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of published or unpublished applications may be obtained from Patent Center. Unpublished application information in Patent Center is available to registered users. To file and manage patent submissions in Patent Center, visit: https://patentcenter.uspto.gov. Visit https://www.uspto.gov/patents/apply/patent-center for more information about Patent Center and https://www.uspto.gov/patents/docx for information about filing in DOCX format. For additional questions, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.



/M.H.H./Examiner, Art Unit 2122                                                                                                                                                                                                        

/KAKALI CHAKI/Supervisory Patent Examiner, Art Unit 2122