Detailed Action
This action is in response to Applicant's communications filed 21 July 2022.
Claim(s) 1, 12, and 20 was/were amended.  No claims were cancelled.  No claims were withdrawn.  No claims were added.  Therefore, claims 1-3, 5-6, 12-13, and 20 are pending in this Application.

Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .

Response to Arguments
Applicant's amendments, filed 21 July 2022, with respect to the objections of the Claims 1 and 12 have been fully considered and are sufficient to overcome the objections.  Accordingly, the objections to the claims have been withdrawn.
Applicant's arguments/amendments, filed 21 July 2022, regarding the rejections of claims 1-3, 5-6, 12-13, and 20 under 35 USC 103 have been fully considered  but are not persuasive.
Applicant argues (Remarks, p. 11) that the combination of Venkatesh, Ashenfelter, Stanford, and Shilaskar does not teach newly amended claim language regarding an intermediate ranking list of features and arranging the ranking list of features in descending order.  However, multiple references teach ranking features and ordering them in descending order.  (Venkatesh: "In MI filter method, all features are ranked based on the mutual information between the features and the class labels. Then the features are sorted in descending order based on their ranks." sec. 4.1, p. 370; Shilaskar: "In filter approach, intrinsic properties of data justify inclusion of an attribute or a subset of attributes to the feature set. Filter algorithm initiates the search with a given subset and searches through the feature space using a particular search strategy. It evaluates each variable independently with respect to the class in order to create a ranking. Variables are then ranked from the highest value to the smallest one" sec. 5.1, p. 4147).  Thus, the combination of Venkatesh, Ashenfelter, Stanford, and Shilaskar teaches the limitations of the claims.
Applicant argues (Remarks, p. 11-12) that Shilaskar teaches a forward feature inclusion algorithm different from the Algorithm 1 in claim 1.  Applicant argues that Algorithm 1 is more detailed than what is taught in Shilaskar because Shilaskar does not teach training a model based on a classifier, storing the accuracy, and comparing accuracy to a previous accuracy to determine whether to include or remove a feature.  However, Shilaskar teaches all these limitations.  Shilaskar teaches training a SVM classifier (sec. 7, pp. 4148-4149; sec. 3-4, p. 4147) which teaches training a model based on a classifier, testing each feature for accuracy and observing increases and decreases of accuracy to determine whether to include the feature (sec. 7.3, p. 4149) which teaches storing the accuracy and comparing accuracies to determine whether to include or remove features.
Applicant argues (Remarks, pp. 12-13) that there is no teaching or suggestion to combine Venkatesh, Ashenfelter, Stanford, and Shilaskar.  Applicant argues that Venkatesh has a higher accuracy than Shilaskar, which would teach away from combining Shilaskar with Venkatesh.  However, just because one reference has higher accuracy does not mean there is no reason to combine.  Shilaskar states that its hybrid technique finds smaller subsets and increases the accuracy of diagnosis (Shilaskar: Abstract, p. 4146), thus one of ordinary skill in the art would be motivated to combine Venkatesh and Shilaskar.
The rejection of the dependent claims for depending from rejected claims is maintained.
For the aforementioned reasons, claims 1-3, 5-6, 12-13, and 20 are rejected under 35 USC 103.

Claim Rejections - 35 USC § 103
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1 and 12 are rejected under 35 U.S.C. 103 as being unpatentable over Venkatesh et al. (A Hybrid Feature Selection Approach for Handling a High-Dimensional Data, hereinafter "Venkatesh"), Ashenfelter (US 2016/0292578), Stanford (https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html), and Shilaskar et al. (Feature Selection For Medical Diagnosis: Evaluation For Cardiovascular Diseases, hereinafter "Shilaskar").

Regarding Claims 1 and 12,
Venkatesh teaches a system for performing feature selection in machine learning ("Figure 1 represents the proposed architecture of the Hybrid feature selection." p. 367, sec. 3), the system comprising: 
receiving a dataset ("Performance of the proposed method is measured on three benchmark datasets (Ionosphere, Libras Movement, and Clean) from the UCI Repository." p. 365, sec. Abstract);
performing feature ranking on the dataset using a filter technique to obtain an intermediate ranking list of features (Fig. 1; "Select k best features based on MI" p. 367, sec. 3; The "best features" part teaches that there is some sort of ranking. MI stands for mutual information and it is a filter method, see sec 2 about related work. There is at least a list of k features, if k best features are selected.); 
arranging the intermediate ranking list of features in descending order to obtain an ordered ranking list of features ("In MI filter method, all features are ranked based on the mutual information between the features and the class labels. Then the features are sorted in descending order based on their ranks." sec. 4.1, p. 370); and
performing feature selection on the ordered ranking list using a wrapper technique ("K features are passed through RFE (Recursive Feature Elimination) Wrapper method." p. 367, sec. 3, para. 2; "RFE is a recursive iteration process where features are ranked based on their feature importance… Feature importance is measured at each iteration and the features with less relevant are discarded." p. 369 sec. 3.2 para. 2; Discarding features leaves a list of selected features that are not discarded.) 

Venkatesh does not explicitly teach applicant’s normalized mutual information filter technique.
However, Stanford teaches the filter technique comprising clustering data of the dataset and then using normalized mutual information (NMI) as a metric for ranking to generate the intermediate ranking list of features, the NMI calculated as follows:

    PNG
    media_image1.png
    79
    629
    media_image1.png
    Greyscale
,
where

    PNG
    media_image2.png
    81
    664
    media_image2.png
    Greyscale
,

    PNG
    media_image3.png
    92
    644
    media_image3.png
    Greyscale
, and
where Ω is a set of clusters, S is a set of classes, P(dk)= probability of data in cluster dk, P(sj) = probability of data in cluster sj, G(S) is an entropy of the set of classes, and P(dk ∩ sj)= probability of data being in a convergence of dk and sj, the use of the filter technique and the wrapper technique improving a runtime of theJ:\FIU\353\Amnd-Resp\Responsel.doc/kh4Docket No. FIU.353Serial No. 16/782,730 processor. (Stanford equations 183, 184 and 186 teach applicant’s NMI filter technique. Performing this technique would improve the runtime of a processor according to applicant.)
Stanford and Venkatesh are both concerned with refining clusters to speed up computation. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use a NMI filter technique in order to measure "the amount of information by which our knowledge about the classes increases when we are told what the clusters are." (Stanford just below equation 187) NMI also formalizes that fewer clusters are better since the entropy usually increases with the increase in cluster number.

The Venkatesh/Stanford combination does not explicitly teach a processor and medium with instructions.
However, Ashenfelter teaches a processor (Fig. 2, processor 204); and 
a machine-readable medium in operable communication with the processor and comprising instructions stored thereon that, when executed by the processor, perform the following steps ("FIG. 2, system 200 includes at least one computing device 202. Computing device 202 may execute instructions of application programs or modules stored in system memory, e.g., memory 206." para. 42).
Venkatesh and Ashenfelter are both predictive models. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to put Venkatesh on applicant’s claimed system because the computation requirements of Venkatesh are more than can be reasonably be managed on pen and paper.

The Venkatesh/Stanford/Ashenfelter combination does not explicitly teach wherein the wrapper technique comprising performing a feature inclusion process and the feature inclusion process algorithm.
However, Shilaskar teaches the wrapper technique comprising performing a feature inclusion process ("Hybrid model combines ﬁlter and wrapper approach to achieve better classiﬁcation performance. The features are ranked using distance criterion and then wrapper model is used to evaluate classiﬁcation model. In this work we have taken hybrid approach which combines both, ﬁlter and wrapper models (see Fig. 1)" sec. 5.3, p. 4148; "We have implemented three algorithms with hybrid model with SVM classifier.  1. Forward Feature Inclusion... Features are ranked using distance criterion and then included one by one by forward feature inclusion algorithm. Features are added till the accuracy of subset is better than or equal to the maximum accuracy of full feature set. The search for optimal sub-set continues till the best criterion is met or till the maximum number of iterations is reached." sec. 7, pp. 4148-4149); and
wherein the wrapper technique comprises removing all features that have a dependency on other features ("Redundant features do not contribute anything but noise towards description of target class... Removing redundant features reduce the size of dataset." sec. 2, p. 4147; "Back-elimination algorithm is used to form subsets with reduces number of features." sec. 7.3, p. 4149); and
the feature inclusion process comprising performing Algorithm 1:
Algorithm 1: 
Input: Set of ranked feature S = {f0, f1, f2, .......fm}, where m = total number of features, obtained from the feature ranking phase, f0 is the highest ranked feature and fm is the least ranked feature (Fig. 1, Original full feature set, Full feature set rearranged by feature rank criterion p. 4149; "These criterions (see Table 1) generate all features’ ranks as per their importance towards target class identiﬁcation. The very ﬁrst rank is supposed to have maximum signiﬁcance towards target class identiﬁcation. The signiﬁcance of the feature goes on reducing as we move towards the last ranked feature." sec. 6, p. 4148) 
Output: provides the selected set of features (Fig. 1, Optimal subset found, p. 4149)
Initialization: (Forward Feature Inclusion Algorithm, Initialize subset, sec. 7.2, p. 4149) 
1: Lst = S[0] prev=0, where prev represents a previous accuracy of a model 
2: for k = 0 to m-1 do 
3: 	x_tst = x_tst [ Lst ] 
4: 	x_tr =x_tr [ Lst ] 
5: 	train the model based on any classifier and store the accuracy on acc 
6: 	if acc > prev then 
7: 		if (k ≠ m − 1) then 
8: 			Add S[ k + 1 ] into the Lst 
9:	 		prev=acc 
10: 		else 
11: 		end if 
12: 	else 
13: 		remove S [ k ] object from the Lst 
14: 		if (k ≠ m − 1) then 
15: 			add S[ k + 1 ] to the Lst 
16: 		else 
17: 		end if 
18: 	end if 
19: end for 
("We have implemented three algorithms with hybrid model with SVM classifier.  1. Forward Feature Inclusion... Features are ranked using distance criterion and then included one by one by forward feature inclusion algorithm. Features are added till the accuracy of subset is better than or equal to the maximum accuracy of full feature set. The search for optimal sub-set continues till the best criterion is met or till the maximum number of iterations is reached." sec. 7, pp. 4148-4149; this teaches adding the features one by one and comparing the accuracy to find the most accurate subset of features; "Randomly selected 336 vectors are used for training and unseen 84 vectors are used for testing with ﬁve fold evaluation and average value is listed." sec. 3, p. 4147; "The SVM model is applied to the training set, calculating the optimal hyperplane that divides the classes forming the set." sec. 4, p. 4147; this teaches training the model based on a classifier; "As we can see in Fig. 6 at point 1 on X- axis, feature 1 when eliminated there is reduction in accuracy, thus that feature is restored. Whereas when feature 2 is eliminated, there is an increase in accuracy thus that feature is deleted. Results of this technique for arrhythmia and heart dataset are shown in Figs. 7 and 8. Back-elimination algorithm starts with full set and elimination of rightmost feature listed. When the features are ranked in decreasing order of importance (Forward ranked), ones with lower ranks gets eliminated during initial stages. Thus the effect on overall accuracy is low. As we can see in Fig. 5, initially feature with lower importance is searched thus there is small effect on accuracy. Whereas in Fig. 6 feature with higher importance is eliminated thus a large swing in accuracy is observed. Back-elimination algorithm is used to form subsets with reduced number of features." sec. 7.3, p. 4149)
Venkatesh and Shilaskar are analogous art because both are directed to feature selection algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning feature selection method of the Venkatesh/Ashenfelter/Stanford combination with the feature selection techniques of Shilaskar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the accuracy of classifiers, as suggested by Shilaskar (Shilaskar: sec. 8, p. 4153).

Claims 2-3, 5-6, 13, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Venkatesh et al. (A Hybrid Feature Selection Approach for Handling a High-Dimensional Data, hereinafter "Venkatesh"), Ashenfelter (US 2016/0292578), Stanford (https://nlp.stanford.edu/IR-book/html/htmledition/evaluation-of-clustering-1.html), Shilaskar et al. (Feature Selection For Medical Diagnosis: Evaluation For Cardiovascular Diseases, hereinafter "Shilaskar"), and Sculley (Web-Scale K-Means Clustering).

Regarding Claim 2,
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination teaches claim 1.
Venkatesh further teaches the filter technique ("Select k best features based on MI" p. 367, sec. 3,  Fig. 1; MI stands for mutual information and it is a filter method, see sec 2 about related work.). 
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination does not explicitly teach K-means clustering.
However, Sculley teaches K-means clustering ("First, we propose the use of mini-batch optimization for k-means clustering." Abstract; see also Algorithm 1 on p. 1).
Venkatesh sorts for the k best features and Sculley teaches a k-means clustering technique to find the k best clusters/features for representing a dataset. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Sculley’s method because "[t]his reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent…" (Sculley abs).

Regarding Claims 3 and 13,
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination teaches claims 1 and 12.
Venkatesh further teaches the filter technique ("Select k best features based on MI" p. 367, sec. 3,  Fig. 1; MI stands for mutual information and it is a filter method, see sec 2 about related work).
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination does not explicitly teach mini-batch K-means clustering.
However, Sculley teaches mini batch K-means clustering (Sculley abs "First, we propose the use of mini-batch optimization for k-means clustering." And algorithm 1 on p. 1).
Venkatesh sorts for the k best features and Sculley teaches a k-means clustering technique to find the k best clusters/features for representing a dataset. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Sculley’s method because "[t]his reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent…" (Sculley abs).

Regarding Claim 5,
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination teaches claim 1.  The Venkatesh/Ashenfelter/Stanford/Shilaskar combination does not explicitly teach the clustering of the data of the dataset comprising K-means clustering.
Sculley teaches the clustering of the data of the dataset comprising K-means clustering (Sculley abs "First, we propose the use of mini-batch optimization for k-means clustering." And algorithm 1 on pg 1).
Venkatesh sorts for the k best features and Sculley teaches a k-means clustering technique to find the k best clusters/features for representing a dataset. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Sculley’s method because "[t]his reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent…" (Sculley abs).

Regarding Claim 6,
The Venkatesh/Ashenfelter/Stanford/Shilaskar combination teaches claim 1.  The Venkatesh/Ashenfelter/Stanford/Shilaskar combination does not explicitly teach the clustering of the data of the dataset comprising mini-batch K-means clustering. 
Sculley teaches the clustering of the data of the dataset comprising mini-batch K-means clustering (Sculley abs "First, we propose the use of mini-batch optimization for k-means clustering." And algorithm 1 on pg 1.).
Venkatesh sorts for the k best features and Sculley teaches a k-means clustering technique to find the k best clusters/features for representing a dataset. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Sculley’s method because "[t]his reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent…" (Sculley abs).

Regarding Claim 20,
Venkatesh teaches a system for performing feature selection in machine learning ("Figure 1 represents the proposed architecture of the Hybrid feature selection." p. 367, sec. 3), the system comprising: 
receiving a dataset ("Performance of the proposed method is measured on three benchmark datasets (Ionosphere, Libras Movement, and Clean) from the UCI Repository." p. 365, sec. Abstract);
performing feature ranking on the dataset using a filter technique to obtain an intermediate ranking list of features (Fig. 1; "Select k best features based on MI" p. 367, sec. 3; The "best features" part teaches that there is some sort of ranking. MI stands for mutual information and it is a filter method, see sec 2 about related work. There is at least a list of k features, if k best features are selected.); 
arranging the intermediate ranking list of features in descending order to obtain an ordered ranking list of features ("In MI filter method, all features are ranked based on the mutual information between the features and the class labels. Then the features are sorted in descending order based on their ranks." sec. 4.1, p. 370); and
performing feature selection on the ordered ranking list using a wrapper technique ("K features are passed through RFE (Recursive Feature Elimination) Wrapper method." p. 367, sec. 3, para. 2; "RFE is a recursive iteration process where features are ranked based on their feature importance… Feature importance is measured at each iteration and the features with less relevant are discarded." p. 369 sec. 3.2 para. 2; Discarding features leaves a list of selected features that are not discarded.) 
the filter technique comprising data of the dataset and then using mutual information (NMI) as a metric for ranking to generate the intermediate ranking list of features (Venkatesh pg 367 sec 3  fig 1 "Select k best features based on MI". The "best features" part teaches that there is some sort of ranking. MI stands for mutual information and it is a filter method, see sec 2 about related work),

Venkatesh does not explicitly teach applicant’s normalized mutual information filter technique.
However, Stanford teaches the filter technique comprising clustering data of the dataset and then using normalized mutual information (NMI) as a metric for ranking to generate the ranking list of features, the NMI calculated as follows:

    PNG
    media_image1.png
    79
    629
    media_image1.png
    Greyscale
,
where

    PNG
    media_image2.png
    81
    664
    media_image2.png
    Greyscale
,

    PNG
    media_image3.png
    92
    644
    media_image3.png
    Greyscale
, and
where Ω is a set of clusters, S is a set of classes, P(dk)= probability of data in cluster dk, P(sj) = probability of data in cluster sj, G(S) is an entropy of the set of classes, and P(dk ∩ sj)= probability of data being in a convergence of dk and sj, the use of the filter technique and the wrapper technique improving a runtime of theJ:\FIU\353\Amnd-Resp\Responsel.doc/kh4Docket No. FIU.353Serial No. 16/782,730 processor. (Stanford equations 183, 184 and 186 teach applicant’s NMI filter technique. Performing this technique would improve the runtime of a processor according to applicant.)
Stanford and Venkatesh are both concerned with refining clusters to speed up computation. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use a NMI filter technique in order to measure "the amount of information by which our knowledge about the classes increases when we are told what the clusters are." (Stanford just below equation 187) NMI also formalizes that fewer clusters are better since the entropy usually increases with the increase in cluster number.

The Venkatesh/Stanford combination does not explicitly teach a processor and medium with instructions.
However, Ashenfelter teaches a processor (Fig. 2, processor 204); and 
a machine-readable medium in operable communication with the processor and comprising instructions stored thereon that, when executed by the processor, perform the following steps ("FIG. 2, system 200 includes at least one computing device 202. Computing device 202 may execute instructions of application programs or modules stored in system memory, e.g., memory 206." para. 42).
Venkatesh and Ashenfelter are both predictive models. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to put Venkatesh on applicant’s claimed system because the computation requirements of Venkatesh are more than can be reasonably be managed on pen and paper.

The Venkatesh/Stanford/Ashenfelter combination does not explicitly teach clustering.
However, Sculley teaches the filter technique comprising clustering data of the dataset (Sculley abs "First, we propose the use of mini-batch optimization for k-means clustering." And algorithm 1 on pg 1),
the clustering of the data of the dataset comprising mini-batch K-means clustering (Sculley abs "First, we propose the use of mini-batch optimization for k-means clustering." And algorithm 1 on pg 1.).
Venkatesh sorts for the k best features and Sculley teaches a k-means clustering technique to find the k best clusters/features for representing a dataset. It would have been obvious to a person having ordinary skill in the art, at the time of filing, to use Sculley’s method because "[t]his reduces computation cost by orders of magnitude compared to the classic batch algorithm while yielding significantly better solutions than online stochastic gradient descent…" (Sculley abs)

The Venkatesh/Stanford/Ashenfelter/Sculley combination does not explicitly teach the wrapper technique comprising removing all features that have a dependency on other features, and the wrapper technique comprising performing a feature inclusion process, and the feature inclusion process algorithm.
However, Shilaskar teaches the wrapper technique comprising removing redundant features that have a dependency J:\FIU\353\Application\App-as-filed.docx/kh33FIU.353 on other features ("Redundant features do not contribute anything but noise towards description of target class... Removing redundant features reduce the size of dataset." sec. 2, p. 4147; "Back-elimination algorithm is used to form subsets with reduces number of features." sec. 7.3, p. 4149),
the wrapper technique comprising performing a feature inclusion process ("Hybrid model combines ﬁlter and wrapper approach to achieve better classiﬁcation performance. The features are ranked using distance criterion and then wrapper model is used to evaluate classiﬁcation model. In this work we have taken hybrid approach which combines both, ﬁlter and wrapper models (see Fig. 1)" sec. 5.3, p. 4148; "We have implemented three algorithms with hybrid model with SVM classifier.  1. Forward Feature Inclusion... Features are ranked using distance criterion and then included one by one by forward feature inclusion algorithm. Features are added till the accuracy of subset is better than or equal to the maximum accuracy of full feature set. The search for optimal sub-set continues till the best criterion is met or till the maximum number of iterations is reached." sec. 7, pp. 4148-4149), 
the feature inclusion process comprising performing Algorithm 1:
Algorithm 1: 
Input: Set of ranked feature S = {f0, f1, f2, .......fm}, where m = total number of features, obtained from the feature ranking phase, f0 is the highest ranked feature and fm is the least ranked feature (Fig. 1, Original full feature set, Full feature set rearranged by feature rank criterion p. 4149; "These criterions (see Table 1) generate all features’ ranks as per their importance towards target class identiﬁcation. The very ﬁrst rank is supposed to have maximum signiﬁcance towards target class identiﬁcation. The signiﬁcance of the feature goes on reducing as we move towards the last ranked feature." sec. 6, p. 4148) 
Output: provides the selected set of features (Fig. 1, Optimal subset found, p. 4149)
Initialization: (Forward Feature Inclusion Algorithm, Initialize subset, sec. 7.2, p. 4149) 
1: Lst = S[0] prev=0, where prev represents a previous accuracy of a model 
2: for k = 0 to m-1 do 
3: 	x_tst = x_tst [ Lst ] 
4: 	x_tr =x_tr [ Lst ] 
5: 	train the model based on any classifier and store the accuracy on acc 
6: 	if acc > prev then 
7: 		if (k ≠ m − 1) then 
8: 			Add S[ k + 1 ] into the Lst 
9:	 		prev=acc 
10: 		else 
11: 		end if 
12: 	else 
13: 		remove S [ k ] object from the Lst 
14: 		if (k ≠ m − 1) then 
15: 			add S[ k + 1 ] to the Lst 
16: 		else 
17: 		end if 
18: 	end if 
19: end for 
("We have implemented three algorithms with hybrid model with SVM classifier.  1. Forward Feature Inclusion... Features are ranked using distance criterion and then included one by one by forward feature inclusion algorithm. Features are added till the accuracy of subset is better than or equal to the maximum accuracy of full feature set. The search for optimal sub-set continues till the best criterion is met or till the maximum number of iterations is reached." sec. 7, pp. 4148-4149; this teaches adding the features one by one and comparing the accuracy to find the most accurate subset of features; "Randomly selected 336 vectors are used for training and unseen 84 vectors are used for testing with ﬁve fold evaluation and average value is listed." sec. 3, p. 4147; "The SVM model is applied to the training set, calculating the optimal hyperplane that divides the classes forming the set." sec. 4, p. 4147; this teaches training the model based on a classifier; "As we can see in Fig. 6 at point 1 on X- axis, feature 1 when eliminated there is reduction in accuracy, thus that feature is restored. Whereas when feature 2 is eliminated, there is an increase in accuracy thus that feature is deleted. Results of this technique for arrhythmia and heart dataset are shown in Figs. 7 and 8. Back-elimination algorithm starts with full set and elimination of rightmost feature listed. When the features are ranked in decreasing order of importance (Forward ranked), ones with lower ranks gets eliminated during initial stages. Thus the effect on overall accuracy is low. As we can see in Fig. 5, initially feature with lower importance is searched thus there is small effect on accuracy. Whereas in Fig. 6 feature with higher importance is eliminated thus a large swing in accuracy is observed. Back-elimination algorithm is used to form subsets with reduced number of features." sec. 7.3, p. 4149)
Venkatesh and Shilaskar are analogous art because both are directed to feature selection algorithms. It would have been obvious to one of ordinary skill in the art before the effective filing date to combine the machine learning feature selection method of the Venkatesh/Ashenfelter/Stanford/Sculley combination with the feature selection techniques of Shilaskar.  The modification would have been obvious because one of ordinary skill in the art would be motivated to increase the accuracy of classifiers, as suggested by Shilaskar (Shilaskar: sec. 8, p. 4153).

Conclusion
Applicant's amendment necessitated the new ground(s) of rejection presented in this Office action.  Accordingly, THIS ACTION IS MADE FINAL.  See MPEP § 706.07(a).  Applicant is reminded of the extension of time policy as set forth in 37 CFR 1.136(a).  
A shortened statutory period for reply to this final action is set to expire THREE MONTHS from the mailing date of this action.  In the event a first reply is filed within TWO MONTHS of the mailing date of this final action and the advisory action is not mailed until after the end of the THREE-MONTH shortened statutory period, then the shortened statutory period will expire on the date the advisory action is mailed, and any extension fee pursuant to 37 CFR 1.136(a) will be calculated from the mailing date of the advisory action.  In no event, however, will the statutory period for reply expire later than SIX MONTHS from the date of this final action. 
Any inquiry concerning this communication or earlier communications from the examiner should be directed to CHARLES C KUO whose telephone number is (571)270-7477.  The examiner can normally be reached on M-F: 9:00 a.m. - 6:00 p.m..
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Ann Lo can be reached on (571) 272-9767.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access to the automated information system, call 800-786-9199 (IN USA OR CANADA) or 571-272-1000.




/CHARLES C KUO/Examiner, Art Unit 2126
/ANN J LO/Supervisory Patent Examiner, Art Unit 2126