Unsupervised feature selection for principal components analysis

Authors:
Christos Boutsidis;Michael W. Mahoney;Petros Drineas
Affiliations:
Rensselaer Polytechnic Institute, Troy, NY, USA;Yahoo! Research, Sunnyvale, CA, USA;Rensselaer Polytechnic Institute, Troy, NY, USA
Venue:
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2008

Citing 19
Cited 13

Some applications of the rank revealing QR factorization

SIAM Journal on Scientific and Statistical Computing
On Rank-Revealing Factorisations

SIAM Journal on Matrix Analysis and Applications
Efficient algorithms for computing a strong rank-revealing QR factorization

SIAM Journal on Scientific Computing
Computing rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Algorithm 782: codes for rank-revealing QR factorizations of dense matrices

ACM Transactions on Mathematical Software (TOMS)
Unsupervised Feature Selection Using Feature Similarity

IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Feature Selection in Conceptual Clustering

ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Fast Monte-Carlo Algorithms for finding low-rank approximations

FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Ranking a random feature for variable and feature selection

The Journal of Machine Learning Research
Text categorization with many redundant features: using aggressive feature selection to make SVMs competitive with C4.5

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Selection for Unsupervised Learning

The Journal of Machine Learning Research
Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices

ACM Transactions on Mathematical Software (TOMS)
Matrix approximation and projective clustering via volume sampling

SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Tensor-CUR decompositions for tensor-based data

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Feature Selection for Unsupervised and Supervised Inference: The Emergence of Sparsity in a Weight-Based Approach

The Journal of Machine Learning Research
Spectral feature selection for supervised and unsupervised learning

Proceedings of the 24th international conference on Machine learning
Subspace sampling and relative-error matrix approximation: column-row-based methods

ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
A Direct Formulation for Sparse PCA Using Semidefinite Programming

SIAM Review
Identifying critical variables of principal components for unsupervised feature selection

IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics

Clustered subset selection and its applications on it service metrics

Proceedings of the 17th ACM conference on Information and knowledge management
An improved approximation algorithm for the column subset selection problem

SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Efficient computation of PCA with SVD in SQL

Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Unsupervised feature selection for multi-cluster data

Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative codeword selection for image representation

Proceedings of the international conference on Multimedia
Eigenvector sensitive feature selection for spectral clustering

ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Column subset selection for active learning in image classification

Neurocomputing
Fast PCA for processing calcium-imaging data from the brain of drosophila melanogaster

Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Finding Structure with Randomness: Probabilistic Algorithms for Constructing Approximate Matrix Decompositions

SIAM Review
Column subset selection via sparse approximation of SVD

Theoretical Computer Science
Randomized Algorithms for Matrices and Data

Foundations and Trends® in Machine Learning
Classification of Epilepsy Using High-Order Spectra Features and Principle Component Analysis

Journal of Medical Systems
Self-taught dimensionality reduction on the high-dimensional small-sized data

Pattern Recognition

Quantified Score

Hi-index	0.00

Visualization

Abstract

Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the "best" subset of exactly k columns from an m x n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previously-existing results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, document-term data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k "eigenfeatures."