Some applications of the rank revealing QR factorization
SIAM Journal on Scientific and Statistical Computing
On Rank-Revealing Factorisations
SIAM Journal on Matrix Analysis and Applications
Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM Journal on Scientific Computing
Computing rank-revealing QR factorizations of dense matrices
ACM Transactions on Mathematical Software (TOMS)
Algorithm 782: codes for rank-revealing QR factorizations of dense matrices
ACM Transactions on Mathematical Software (TOMS)
Unsupervised Feature Selection Using Feature Similarity
IEEE Transactions on Pattern Analysis and Machine Intelligence
Efficient Feature Selection in Conceptual Clustering
ICML '97 Proceedings of the Fourteenth International Conference on Machine Learning
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
Ranking a random feature for variable and feature selection
The Journal of Machine Learning Research
ICML '04 Proceedings of the twenty-first international conference on Machine learning
Feature Selection for Unsupervised Learning
The Journal of Machine Learning Research
Algorithm 844: Computing sparse reduced-rank approximations to sparse matrices
ACM Transactions on Mathematical Software (TOMS)
Matrix approximation and projective clustering via volume sampling
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Tensor-CUR decompositions for tensor-based data
Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
The Journal of Machine Learning Research
Spectral feature selection for supervised and unsupervised learning
Proceedings of the 24th international conference on Machine learning
Subspace sampling and relative-error matrix approximation: column-row-based methods
ESA'06 Proceedings of the 14th conference on Annual European Symposium - Volume 14
Identifying critical variables of principal components for unsupervised feature selection
IEEE Transactions on Systems, Man, and Cybernetics, Part B: Cybernetics
Clustered subset selection and its applications on it service metrics
Proceedings of the 17th ACM conference on Information and knowledge management
An improved approximation algorithm for the column subset selection problem
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Efficient computation of PCA with SVD in SQL
Proceedings of the 2nd Workshop on Data Mining using Matrices and Tensors
Unsupervised feature selection for multi-cluster data
Proceedings of the 16th ACM SIGKDD international conference on Knowledge discovery and data mining
Discriminative codeword selection for image representation
Proceedings of the international conference on Multimedia
Eigenvector sensitive feature selection for spectral clustering
ECML PKDD'11 Proceedings of the 2011 European conference on Machine learning and knowledge discovery in databases - Volume Part II
Fast PCA for processing calcium-imaging data from the brain of drosophila melanogaster
Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Column subset selection via sparse approximation of SVD
Theoretical Computer Science
Randomized Algorithms for Matrices and Data
Foundations and Trends® in Machine Learning
Classification of Epilepsy Using High-Order Spectra Features and Principle Component Analysis
Journal of Medical Systems
Self-taught dimensionality reduction on the high-dimensional small-sized data
Pattern Recognition
Hi-index | 0.00 |
Principal Components Analysis (PCA) is the predominant linear dimensionality reduction technique, and has been widely applied on datasets in all scientific domains. We consider, both theoretically and empirically, the topic of unsupervised feature selection for PCA, by leveraging algorithms for the so-called Column Subset Selection Problem (CSSP). In words, the CSSP seeks the "best" subset of exactly k columns from an m x n data matrix A, and has been extensively studied in the Numerical Linear Algebra community. We present a novel two-stage algorithm for the CSSP. From a theoretical perspective, for small to moderate values of k, this algorithm significantly improves upon the best previously-existing results [24, 12] for the CSSP. From an empirical perspective, we evaluate this algorithm as an unsupervised feature selection strategy in three application domains of modern statistical data analysis: finance, document-term data, and genetics. We pay particular attention to how this algorithm may be used to select representative or landmark features from an object-feature matrix in an unsupervised manner. In all three application domains, we are able to identify k landmark features, i.e., columns of the data matrix, that capture nearly the same amount of information as does the subspace that is spanned by the top k "eigenfeatures."