DETAILED ACTION
Notice of Pre-AIA  or AIA  Status
The present application, filed on or after March 16, 2013, is being examined under the first inventor to file provisions of the AIA .
Response to Arguments
Applicant's arguments filed 09/17/2020 have been fully considered but they are not persuasive. 
U.S.C. §103 Rejection
Applicant argues: The claims are nonobvious: (1) One of Ordinary Skill in the Art would not have combined the subject features: Qian teaches away from a non-labeled approach and/or modifying Qian in view of Wang as implied would change the principle of operation of the Qian reference; and (2) the Office Action lacks a clear, reasoned rationale required for a finding of obviousness. 
The Applicants respectfully submit that the Office Action fails to establish a prima facie case of obviousness as required by KSR and its progeny. On page five, the Office Action expressly acknowledges that Qian fails to disclose an embedded supervised feature selection framework. The applicants fundamentally agree and maintain that the Qian reference describes a method of clustering and conducting feature selection "simultaneously," and that Qian does not described feature selection as claimed, embedded directly into a clustering algorithm. Yet, the Office Action summarily proceeds to conclude that Wang teaches the embedded unsupervised feature selection framework of claim 1, and presumably cures the deficiencies of Wang. 
First, assuming arguendo that Wang teaches some method of unsupervised and embedded feature selection framework without the use of labels, the Qian and Wang references are technically incompatible because Qian needs to learn pseudo cluster indicator labels (a need which is eliminated with the novel framework 100 of the subject application and claimed) and modifying Qian away from use of such pseudo labels would change the principal of operation of the technology disclosed by the reference. Qian emphasizes multiple times that learning pseudo labels is critical to its disclosed purpose and function. For example, on page 1622 and again on page 1623, Qian describes, respectively, that "it is very important to predict [sic] good cluster indicators as pseudo labels for unsupervised feature selection" and that "learning accurate pseudo 

Examiner Response: Examiner respectfully disagrees. Applicant argues “First, assuming arguendo that Wang teaches some method of unsupervised and embedded feature selection framework without the use of labels, the Qian and Wang references are technically incompatible because Qian needs to learn pseudo cluster indicator labels (a need which is eliminated with the novel framework 100 of the subject application and claimed) and modifying Qian away from use of such pseudo labels would change the principal of operation of the technology disclosed by the reference.” In response to applicant's argument that the references fail to show certain features of applicant’s invention, it is noted that the features upon which applicant relies (i.e., “Qian needs to learn pseudo cluster indicator labels (a need which is eliminated with the novel framework 100 of the subject application and claimed)”) are not recited in the rejected claim(s).  Although the claims are interpreted in light of the specification, limitations from the specification are not read into the claims.  See In re Van Geuns, 988 F.2d 1181, 26 USPQ2d 1057 (Fed. Cir. 1993). Nowhere in the claims does it mention anything pertaining to pseudo labels or eliminating the need for pseudo labels. At best, the claims recite “an embedded unsupervised feature selection framework” which is taught by Qian (see e.g., Qian, pg. 2; In this work, we focus on the unsupervised feature selection model design. Most existing unsupervised feature selection methods are similar to filter methods in supervised learning, and define different score systems to select features. Considering the advantages of embedded feature selection methods in supervised learning, we hope to use the embedded feature selection mechanism in an unsupervised way.).
	In addition, Applicant argues that Qian is not combinable with Wang because Wang states “labeling data is time and labor consuming” however, Wang is referring to supervised learning. In supervised learning, a human annotator will manually label a training sample used for classification. For example, an image of a dog can be can be used for training a classifier. A human can label the image “dog” so that the classifier knows to label the image as a dog. In unsupervised training, the image is not labelled, therefore a classifier can still classify the image, however, there is no way for the classifier to know for certain that the image is a dog. The downside to supervised learning is that it is time consuming and labor intensive for a human to physically sift through hundreds or thousands of training samples to label them. Thus, the immediate sentence following “In the real world applications, labeling data is time and labor consuming” is “Thus, unsupervised feature selection methods are desired for many practical applications.” What the Applicant appears to be arguing is that Qian teaches pseudo labels for clustering. However, pseudo labels for clustering is fundamentally different for labeling training samples for classification. This is further evidenced by Qian (see e.g., Qian, pg. 1621 “Supervised feature selection methods, such as [Duda et al., ][Nie et al., 2010a][Zhao et al., 2010][Nie et al., 2008], are usually able to effectively select good features since labels of training data, which contain the essential discriminative information for classification, can be used. However, in unsupervised scenario, label information is unavailable directly, which makes the task of feature selection more challenging”). Arguments are not persuasive. 

Applicant argues: Second, the Office Action lacks a clear rationale for asserting that the claims are obvious based on Qian in view of Wang. As previously noted, the Office Action expressly acknowledges that Qian fails to disclose an embedded supervised feature selection framework; yet then concludes, without clear explanation, that Wang discloses the embedded unsupervised feature selection framework as claimed and otherwise renders the claim obvious as a whole. The Applicants respectfully note that the embedded unsupervised feature selection framework includes various sub-features and it is unclear, based on the identification to an introductory passage on page 2 of Wang, how the Office Action is ascertaining the differences between Wang and the embedded unsupervised feature selection framework. Specifically, for example, claim 1 defines claim features reciting implementing, via the computing device, "an embedded unsupervised feature selection framework" that integrates sparse-learning-based feature selection process into a matrix-factorization-based clustering process and further comprises: 
factorizing the data matrix into two matrices, U and V, each row of U being a cluster membership of a corresponding data sample in the data matrix, and each row of V being a latent representation of a corresponding column in the data matrix, and selecting one or more features from the data matrix, by performing a theoretical proof that by adding an orthogonal constraint on U, feature selection on V removes undesired features, and with the theoretical proof, adopting sparse learning on V to perform unsupervised feature selection in an unsupervised environment. 
As recognized under MPEP section 2142, respectfully, the examiner must provide evidence which as a whole shows that the legal determination sought to be proved (i.e., the reference teachings establish a prima facie case of obviousness) is more probable than not, and the differences between the references and the claims weighs against this determination. These further reasons support withdrawal of the 103 rejections. 

Examiner Response: Examiner respectfully disagrees. Wang teaches a method of unsupervised feature selection (See e.g., Wang, Abs. “Feature selection plays a crucial role in scientific research and practical applications. In the real world applications, labeling data is time and labor consuming. Thus, unsupervised feature selection methods are desired for many practical applications”). Qian also see e.g., Qian, Abs. A new unsupervised feature selection method, i.e., Robust Unsupervised Feature Selection (RUFS), is proposed.). In addition, both Wang and Qian aim to train a classifier to improve its performance. Wang discusses a wrapper approach and an embedded approach to classifiers (See e.g., Wang, pg. 306-307; “Wrapper methods treat the classifier as a black box, and use classification results to evaluate potential feature subset, thus the features selected by wrapper methods usually have good performance. However, their computational cost is very high since it need to use the classifier all the way through the process of feature selection. The embedded methods treat classifier as a white box, and incorporate feature selection and classification model into a single optimization problem. Thus, the classification performance is good, and the computational cost is much lower than wrapper method.”). Qian also explores an wrapper approach and embedded approach to classifiers (see e.g., Qian, pg. 1621 “For wrapper methods [Kohavi and John, 1997][Guyon and Elisseeff, 2003][Rakotomamonjy, 2003], feature selection is wrapped in a learning algorithm and the classification performance on selected features is taken as the evaluation criterion. Embedded approaches [Vapnik, 1999][Zhu et al., 2004][Hou et al., 2011] perform feature selection when training the models. Wrapper and embedded methods couples feature selection with built-in classifiers tightly, which lead to less generality and extensive computation.”) therefore, Wang and Qian teach in the same filed of endeavor. In addition, both Wang and Qian use k-means clustering. It can be seen by one of ordinary skill in the art that .
Claim Rejections - 35 USC § 103
In the event the determination of the status of the application as subject to AIA  35 U.S.C. 102 and 103 (or as subject to pre-AIA  35 U.S.C. 102 and 103) is incorrect, any correction of the statutory basis for the rejection will not be considered a new ground of rejection if the prior art relied upon, and the rationale supporting the rejection, would be the same under either status.  
The following is a quotation of 35 U.S.C. 103 which forms the basis for all obviousness rejections set forth in this Office action:
A patent for a claimed invention may not be obtained, notwithstanding that the claimed invention is not identically disclosed as set forth in section 102, if the differences between the claimed invention and the prior art are such that the claimed invention as a whole would have been obvious before the effective filing date of the claimed invention to a person having ordinary skill in the art to which the claimed invention pertains. Patentability shall not be negated by the manner in which the invention was made.

Claims 1, 6, 7, 8, 9, 10, 15, 16, 17, 18, and 19 are rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (“Robust Unsupervised Feature Selection”) in view of Wang et al. ("Unsupervised feature selection via unified trace ratio formulation and k-means clustering (track).").
Regarding Claim 1,
Qian et al. teaches a method for managing high-dimensional data, the method comprising: 
generating a data matrix (pg. 1622; For matrix M = (mij ), its i-th row, j-th column are denoted by mi , mj respectively.) for high-dimensional data (pg. 1623 In the era of big data, high dimensional data is prevalent and the number of features is usually very high (otherwise, we may not need feature selection), for example, text data, genetic data, or image data with high resolution… For practical use of unsupervised feature selection, we require algorithms to be able to handle large number of features and large number of data samples which are not only computationally efficient but also save memory. Robust unsupervised feature selection aims to select relevant features from high-dimensional data by converting it into a feature matrix) with a computing device, each row of the data matrix being a d-dimensional data sample (pg. 1622; Assume that we have n samples X = {xi} n i=1.), and each column of the d-dimensional data sample being a feature (pg. 1622; Let X = [x1, ··· , xn] T denote the data matrix with each row being a data feature vector Examiner note: Each row of the data matrix is a feature vector. The column of the matrix corresponds to one feature of the feature vector.); and   
implementing, via the computing device, an… unsupervised feature selection framework that integrates sparse-learninq-based feature selection process (Pg. 1622 A Robust Unsupervised Feature Selection (RUFS) algorithm is proposed, where robust clustering and robust feature selection are simultaneously performed. This teaches clustering using an unsupervised feature selection framework. Robust feature selection is performed through jointly minimizing the last two terms (joint l2,1 norms minimization), which is able to handle outliers and noise in data. The l2,1 norm imposed on the feature selection matrix W guarantees the property of sparseness in rows; This teaches sparse learning which imposes sparse constraints.) into a matrix-factorization-based clusterinq process (pg. 1623; In this work, however, we propose to utilize local learning regularized robust nonnegative matrix factorization with orthogonal constraint to learn the pseudo cluster labels.), comprisinq: 
factorizing the data matrix into two matrices, U (pg. 1622; The scaled cluster indicator matrix [Yang et al., 2011][Li et al., 2012] G is defined as G = [g1, ··· , gn] T = Y YT Y− 1 2 , (2)) and V (pg. 1623; the feature selection matrix (or projection matrix for regression) W), each row of U being a cluster membership of a corresponding data sample in the data matrix (pg. 1622 G is defined as G = [g1, ··· , gn] T = Y YT Y− 1 2 , (2) where gi is the scaled cluster indicator of xi), and each row of V beinq a latent representation of a corresponding column in the data matrix (pg. 1623; the feature selection matrix (or projection matrix for regression) W which is sparse in rows), and 
selectinq one or more features from the data matrix, by performing a theoretical proof that by adding an orthogonal constraint on U (pg. 1623; we thus constraint G to be orthonormal by columns, and the original optimization problem is relaxed to min F,G,W X − GF2,1 + νTr GTLG + αXW − G2,1 + βW2,1 s.t. G ∈ Rn×c + ,GT G = Ic, F ∈ Rc×d + ,W ∈ Rd×c , (7)) , feature selection on V removes undesired features, and with the theoretical proof (pg.1623; We can thus filter out the features corresponding to zero rows of W when performing feature selection), adopting sparse learning on V (pg. 1623; The l2,1 norm imposed on the feature selection matrix W guarantees the property of sparseness in rows.) to perform unsupervised feature selection in an unsupervised environment (pg. 1621; From the perspective of label availability, feature selection algorithms can also be classified into supervised feature selection and unsupervised feature selection.).
Qian et al. does not explicitly disclose an embedded unsupervised feature selection framework
However, Wang et al. teaches
an embedded unsupervised feature selection framework (pg. 2; In this work, we focus on the unsupervised feature selection model design. Most existing unsupervised feature selection methods are similar to filter methods in supervised learning, and define different score systems to select features. Considering the advantages of embedded feature selection methods in supervised learning, we hope to use the embedded feature selection mechanism in an unsupervised way.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of unsupervised feature selection of Qian et al. with the method of unsupervised feature selection of Wang et al.
Doing so would allow for improved classification results (pg. 1; Linear discriminant analysis (LDA) with trace ratio criterion is a supervised dimensionality reduction method that has shown good performance to improve classifications.).
Regarding Claim 6,
Qian et al. and Wang et al. teach the method of claim 1. Qian et al. further teaches wherein the embedded unsupervised feature selection framework is optimized using a first equality constraint and a second equality constraint (pg. 1624 eq. 8-14; the partial derivatives of L (G, F,W) w.r.t. G, F, and W can be obtained ∇GL = (GF − X) FT  [r1 ⊗ 11×c]+2νLG + α (G − XW)  [r2 ⊗ 11×c] + ζG  GT G − Ic  , (10) ∇FL = GT [(GF − X)  [r1 ⊗ 11×d]] , (11) ∇WL = αXT [(XW − G)  [r2 ⊗ 11×c]] + βW  [r3 ⊗ 11×c] , (12) where ⊗ is the Kronecker product,  is the element-wise division, and 1 is an all 1 matrix. Solutions of problem (8) satisfy the Kuhn-Tucker conditions ⎧ ⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎩ ∂L ∂Gik = 0 if Gik > 0; ∂L ∂Gik ≥ 0 if Gik = 0 ∂L ∂Fkj = 0 if Fkj > 0; ∂L ∂Fkj ≥ 0 if Fkj = 0 ∂L ∂Wjk = 0 . (13) The projection operator [TΩM] ij =  Mij if Xij > 0 min {Mij , 0} if Xij = 0 (14)).
Regarding claim 7, 
Qian et al. and Wang et al. teach the method of claim 1. Qian et al. further teaches wherein the embedded unsupervised feature selection framework sorts the one or more features into a descending order (pg. 1625 Sort all d features according to wj k2 in descending order and select the top p ranked features.).
Regarding Claim 8,
Qian et al. and Wang et al. teach the method of claim 7. Qian et al. further teaches wherein the embedded unsupervised feature selection framework selects one or more top ranked features from the descending order (pg. 1625 Sort all d features according to wj k2 in descending order and select the top p ranked features.).
Regarding claim 9,
Qian et al. and Wang et al. teach the method of claim 1. Qian et al. further teaches  wherein the embedded unsupervised feature selection framework removes at least one of redundant, irrelevant, or noisy features of the high-dimensional data (pg. 1623 Robust feature selection is performed through jointly minimizing the last two terms (joint l2,1 norms minimization), which is able to handle outliers and noise in data.).
Regarding Claim 10,
Qian et al. teaches one or more non-transitory tangible computer-readable storage media storing computer-executable instructions for performing a computer process on a computing system, the computer process comprising:
generating a data matrix for high-dimensional data (pg. 1623 In the era of big data, high dimensional data is prevalent and the number of features is usually very high (otherwise, we may not need feature selection), for example, text data, genetic data, or image data with high resolution… For practical use of unsupervised feature selection, we require algorithms to be able to handle large number of features and large number of data samples which are not only computationally efficient but also save memory. Robust unsupervised feature selection aims to select relevant features from high-dimensional data by converting it into a feature matrix), the data matrix having a plurality of rows with one or more features, each of the plurality of rows being a data instance (pg. 1622 For matrix M = (mij ), its i-th row, j-th column are denoted by mi , mj respectively. MF is the Frobenius norm of M and Tr [M] is the trace of M if M is square. For any matrix M ∈ Rr×t , its l2,1-norm is defined as M2,1 = r i=1 p j=1 m2 ij = r i=1  mi   2. (1) Assume that we have n samples X = {xi} n i=1. Let X = [x1, ··· , xn] T denote the data matrix with each row being a data feature vector, in which xi ∈ Rd is the feature descriptor of the i-th sample; Each row is a data feature vector. A data feature vector is an n-dimensional vector of numerical features representing some object (a data instance).); and 
clustering the data matrix into one or more clusters using an embedded unsupervised feature selection framework (pg. 1622 Suppose these n data samples are sampled from c classes and denote Y = [y1, ··· , yn] T ∈ {0, 1} n×c , where yn ∈ {0, 1} c×1 is the cluster indicator vector for sample xi. The scaled cluster indicator matrix [Yang et al., 2011][Li et al., 2012] G is defined as G = [g1, ··· , gn] T = YYT Y− 1 2 , (2) where gi is the scaled cluster indicator of xi.), the embedded unsupervised feature selection framework selecting the one or more features in an unsupervised environment with sparse learning (Pg. 1622 A Robust Unsupervised Feature Selection (RUFS) algorithm is proposed, where robust clustering and robust feature selection are simultaneously performed. This teaches clustering using an unsupervised feature selection framework. Robust feature selection is performed through jointly minimizing the last two terms (joint l2,1 norms minimization), which is able to handle outliers and noise in data. The l2,1 norm imposed on the feature selection matrix W guarantees the property of sparseness in rows; This teaches sparse learning which imposes sparse constraints.).
Qian et al. does not explicitly disclose
…directly embedded.
However, Wang et al. teaches
…directly embedded (pg. 2; In this work, we focus on the unsupervised feature selection model design. Most existing unsupervised feature selection methods are similar to filter methods in supervised learning, and define different score systems to select features. Considering the advantages of embedded feature selection methods in supervised learning, we hope to use the embedded feature selection mechanism in an unsupervised way.).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of unsupervised feature selection of Qian et al. with the method of unsupervised feature selection of Wang et al.
Doing so would allow for improved classification results (pg. 1; Linear discriminant analysis (LDA) with trace ratio criterion is a supervised dimensionality reduction method that has shown good performance to improve classifications.).
Regarding Claim 15,
Claim 15 is the non-transitory tangible computer-readable media corresponding to the method of claim 1. Claim 15 is substantially similar to claim 6 and is rejected on the same grounds.
Regarding Claim 16,
Claim 16 is the non-transitory tangible computer-readable media corresponding to the method of claim 1. Claim 16 is substantially similar to claim 7 and is rejected on the same grounds.
Regarding Claim 17,
Claim 17 is the non-transitory tangible computer-readable media corresponding to the method of claim 1. Claim 17 is substantially similar to claim 8 and is rejected on the same grounds.
Regarding Claim 18,

Regarding Claim 19,
Qian et al. teaches a computing device for managing high-dimensional data, the computing device comprising: 
a processor, the processor configured to: 
access a data matrix associated with a plurality of features (pg. 1622; For matrix M = (mij ), its i-th row, j-th column are denoted by mi , mj respectively… Assume that we have n samples X = {xi} n i=1. Let X = [x1, ··· , xn] T denote the data matrix with each row being a data feature vector, in which xi ∈ Rd is the feature descriptor of the i-th sample.), and 
apply the data matrix to a clusterinq alqorithm (pg. 1622; A Robust Unsupervised Feature Selection (RUFS) algorithm is proposed, where robust clustering and robust feature selection are simultaneously performed) with unsupervised feature selection (pg. 1622; We perform robust clustering and robust feature selection simultaneously to select the most important and discriminative features for unsupervised learning.)… via sparse learning (pg. 1622; Aiming at feature selection, joint l2,1 norms minimization is utilized to learn a robust feature selection matrix which is sparse in rows.), the clustering algorithm clustering the data matrix for the high-dimensional data into one or more clusters (pg. 1622; Denoting N (xi) as the neighborhood of xi, the local learning regularization aims to minimize the sum of prediction errors between the local prediction from N (xi) and the cluster assignment of xi:).
Qian et al. does not explicitly disclose
unsupervised feature selection embedded directly into 70144044.25Attorney Docket No. 055743-557626 M16-062P the clustering algorithm
However, Wang et al. teaches
unsupervised feature selection embedded directly into 70144044.25Attorney Docket No. 055743-557626 M16-062P the clustering algorithm (pg. 2; Considering the advantages of embedded feature selection methods in supervised learning, we hope to use the embedded feature selection mechanism in an unsupervised way. In this paper, we address this problem using the unsupervised trace ratio formulation, and rigorously prove that our unsupervised trace ratio formulation is the unified and unique objective of both trace ratio linear discriminant analysis (LDA) and K-means clustering.)
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the method of unsupervised feature selection of Qian et al. with the method of unsupervised feature selection of Wang et al.
Doing so would allow for improved classification results (pg. 1; Linear discriminant analysis (LDA) with trace ratio criterion is a supervised dimensionality reduction method that has shown good performance to improve classifications.).

Claims 2, 3, 4, 11, 12, and 13 are rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (“Robust Unsupervised Feature Selection”) in view of Wang et al. ("Unsupervised feature selection via unified trace ratio formulation and k-Li et al. ("Multinomial mixture model with feature selection for text clustering").
Regarding Claim 2, 
Qian et al. and Wang et al. teach the method of claim 1. Qian et al. further teaches wherein the embedded unsupervised feature selection framework is generated based on a cluster indicator pg. 1622 Suppose these n data samples are sampled from c classes and denote Y = [y1, ··· , yn] T ∈ {0, 1} n×c , where yn ∈ {0, 1} c×1 is the cluster indicator vector for sample xi. The scaled cluster indicator matrix [Yang et al., 2011][Li et al., 2012] G is defined as G = [g1, ··· , gn] T = YYT Y− 1 2 , (2) where gi is the scaled cluster indicator of xi… Since the latent matrix U and the shared distribution of irrelevancy qðwm;~cÞ are introduced to the multinomial mode).
Qian et al. and Wang et al. do not explicitly dislcose
a latent feature matrix.
However, Li et al. teaches 
a latent feature matrix (pg. 705-706 The latent variables ~/ ¼ f~/1; ... ; ~/Mg are defined to indicate the relevancy of the features to the clustering process, where /m ¼ 1; if feature m is relevant 0; otherwise ð10Þ).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the robust unsupervised feature selection framework of Qian et al. with the latent feature matrix of Li et al.
Doing so would allow for text clustering (pg. 705 We utilize the multinomial mixture model by Nigam et al. [3] for text clustering.).
Regarding Claim 3,
Pg. 704 In most text clustering tasks, the vocabulary size is very large and the dataset is extremely sparse, which will hinder the performance of clustering algorithms. Therefore the feature space dimension needs to be reduced, and the commonly used technique is feature selection.), the embedded unsupervised feature selection framework selecting the one or more features via the latent feature matrix (Pg. 705 Then a modified ‘‘feature saliency” method is proposed to perform feature selection over the dataset. Law et al. defined the ‘‘feature saliency” [5] concept as a set of real-valued parameters, which are used to measure the relevancy of the features to the clustering process. This method makes the feature selection process into a parameter estimation problem, which can be done during the clustering process. The latent variables ~/ ¼ f~/1; ... ; ~/Mg are defined to indicate the relevancy of the features to the clustering process).
Regarding Claim 4,
Qian et al., Wang et al., and Li et al. teach the method of claim 2.
Qian et al. further teaches wherein the latent feature matrix (Pg. 705 The latent variables ~/ ¼ f~/1; ... ; ~/Mg are defined to indicate the relevancy of the features to the clustering process, where /m ¼ 1; if feature m is relevant 0; otherwise  ð10Þ) and the cluster indicator are each set to 0 during initialization (Pg. 1622 where yn ∈ {0, 1} c×1 is the cluster indicator vector for sample xi.) and subsequently converge to an optimal value (Pg. 1623 Given the proposed robust clustering with local learning, RUFS aims to solve the following optimization problem: min F,G,W X − GF2,1 + νTr GTLG + αXW − G2,1 + βW2,1 s.t. G ∈ Rn×c + ,G = YYT Y− 1 2 , F ∈ Rc×d + ,(6) where ν, α, β ∈ R+ are parameters; Qian et al. further discloses an objective function that converges on some optimal value).
Li et al. further teaches wherein the latent feature matrix (Pg. 705 The latent variables ~/ ¼ f~/1; ... ; ~/Mg are defined to indicate the relevancy of the features to the clustering process, where /m ¼ 1; if feature m is relevant 0; otherwise  ð10Þ) are each set to 0 during initialization.
Regarding Claim 11,
Claim 11 is the non-transitory tangible computer-readable media corresponding to the method of claim 1. Claim 11 is substantially similar to claim 2 and is rejected on the same grounds.
Regarding Claim 12,
Claim 12 is the non-transitory tangible computer-readable media corresponding to the method of claim 2. Claim 12 is substantially similar to claim 3 and is rejected on the same grounds.
Regarding Claim 13,
Claim 13 is the non-transitory tangible computer-readable media corresponding to the method of claim 2. Claim 13 is substantially similar to claim 4 and is rejected on the same grounds.
Claims 5, 14, and 20 are rejected under 35 U.S.C. 103 as being unpatentable over Qian et al. (“Robust Unsupervised Feature Selection”) in view of Wang et al. Dai et al. ("A Semisupervised Feature Selection with Support Vector Machine”).
Regarding Claim 5,
Qian et al. teaches the method of claim 1.
Qian et al. does not explicitly disclose
wherein the embedded unsupervised feature selection framework is optimized using an Alternating Direction Method of Multiplier.
However, Dai et al. teaches wherein the embedded unsupervised feature selection framework is optimized using an Alternating Direction Method of Multiplier (pg. 6 The convergence property of Algorithm 8 can be derived from the theory of the alternating direction method of multipliers. According to the standard convergence theory of ADMM, Algorithm 8 satisfies the dual variable convergence [24].).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the robust unsupervised feature selection framework of Qian et al. with the alternating direction method of multipliers of Dai et al.
Doing so would allow for solving large-scale problems (pg. 3 The alternating direction method of multipliers (ADMM) developed in the 1970s and is well suited to distributed convex optimization and in particular to large-scale problems arising in statistics, machine learning, and related areas.).
Regarding Claim 14,

Regarding Claim 20, 
Qian et al. and Wang et al. teach the computing device of claim 19. Wang et al. further teaches wherein the processor is further configured to: 
impose an orthoqonality constraint on a cluster indicator corresponding to the data matrix (pg. 1622; We impose an orthogonal constraint on the cluster indicator matrix to ensure that the learned cluster indicators are much closer to the true cluster labels.); 
define an obiective function havinq an initialized set of constant (pg. 1624; We first define the objective function L (G, F,W) = X − GF2,1 + νTr GTLG + αXW − G2,1 + βW2,1 + ζ 4  GT G − Ic   2 F ,); 
optimize the objective function… (pg. 1623-1624; To solve RUFS, we first rewrite the optimization problem as follows… where ζ is a parameter to control the orthogonality condition. In practice, ζ should be large enough to insure the orthogonality satisfied. We first define the objective function); 
update iteratively a set of auxiliary variables of the objective function… (pg. 1625; Algorithm 3) (pg. 1623; We adopt an alternating optimization (AO) strategy to solve RUFS and list it in Algorithm 3. Following the convergence analysis for a general AO approach, the convergence of Algorithm 3); and 
rank each of the set of feature data (pg. 1625; Sort all d features according to wj k2 in descending order and select the top p ranked features.).

using an alternating direction method of multiplier; and 
until a set of alternatinq direction method of multiplier parameters converqe
However, Dai et al. teaches
using an alternating direction method of multiplier (pg. 6 The convergence property of Algorithm 8 can be derived from the theory of the alternating direction method of multipliers.); and 
until a set of alternating direction method of multiplier parameters converge (pg. 6 The convergence property of Algorithm 8 can be derived from the theory of the alternating direction method of multipliers. According to the standard convergence theory of ADMM, Algorithm 8 satisfies the dual variable convergence [24].).
It would have been obvious to persons’ having ordinary skill in the art before the effective filing date to combine the robust unsupervised feature selection framework of Qian et al. with the alternating direction method of multipliers of Dai et al.
Doing so would allow for solving large-scale problems (pg. 3 The alternating direction method of multipliers (ADMM) developed in the 1970s and is well suited to distributed convex optimization and in particular to large-scale problems arising in statistics, machine learning, and related areas.).



Conclusion

Any inquiry concerning this communication or earlier communications from the examiner should be directed to HENRY K NGUYEN whose telephone number is (571)272-0217.  The examiner can normally be reached on Mon - Fri 7:00am-4:30pm.
Examiner interviews are available via telephone, in-person, and video conferencing using a USPTO supplied web-based collaboration tool. To schedule an interview, applicant is encouraged to use the USPTO Automated Interview Request (AIR) at http://www.uspto.gov/interviewpractice.
If attempts to reach the examiner by telephone are unsuccessful, the examiner’s supervisor, Li B Zhen can be reached on 5712723768.  The fax phone number for the organization where this application or proceeding is assigned is 571-273-8300.
Information regarding the status of an application may be obtained from the Patent Application Information Retrieval (PAIR) system.  Status information for published applications may be obtained from either Private PAIR or Public PAIR.  Status information for unpublished applications is available through Private PAIR only.  For more information about the PAIR system, see https://ppair-my.uspto.gov/pair/PrivatePair. Should you have questions on access to the Private PAIR system, contact the Electronic Business Center (EBC) at 866-217-9197 (toll-free). If you would like assistance from a USPTO Customer Service Representative or access 






/HENRY NGUYEN/Examiner, Art Unit 2121                                                                                                                                                                                                        


/Li B. Zhen/Supervisory Patent Examiner, Art Unit 2121