Some applications of the rank revealing QR factorization
SIAM Journal on Scientific and Statistical Computing
Personalized information delivery: an analysis of information filtering methods
Communications of the ACM - Special issue on information filtering
On Rank-Revealing Factorisations
SIAM Journal on Matrix Analysis and Applications
Efficient algorithms for computing a strong rank-revealing QR factorization
SIAM Journal on Scientific Computing
Latent semantic indexing: a probabilistic analysis
PODS '98 Proceedings of the seventeenth ACM SIGACT-SIGMOD-SIGART symposium on Principles of database systems
Multilevel k-way partitioning scheme for irregular graphs
Journal of Parallel and Distributed Computing
Computing rank-revealing QR factorizations of dense matrices
ACM Transactions on Mathematical Software (TOMS)
Algorithm 782: codes for rank-revealing QR factorizations of dense matrices
ACM Transactions on Mathematical Software (TOMS)
The anatomy of a large-scale hypertextual Web search engine
WWW7 Proceedings of the seventh international conference on World Wide Web 7
Authoritative sources in a hyperlinked environment
Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
On the Optimality of the Backward Greedy Algorithm for the Subset Selection Problem
SIAM Journal on Matrix Analysis and Applications
Concept decompositions for large sparse text data using clustering
Machine Learning
Mining time-changing data streams
Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Pass efficient algorithms for approximating large matrices
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Identifying Representative Trends in Massive Time Series Data Sets Using Sketches
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Fast Monte-Carlo Algorithms for finding low-rank approximations
FOCS '98 Proceedings of the 39th Annual Symposium on Foundations of Computer Science
On clusterings-good, bad and spectral
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Rank degeneracy and least squares problems
Rank degeneracy and least squares problems
Streaming pattern discovery in multiple time-series
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Matrix approximation and projective clustering via volume sampling
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Fast Monte Carlo Algorithms for Matrices II: Computing a Low-Rank Approximation to a Matrix
SIAM Journal on Computing
Linear and Non-Linear Dimensional Reduction via Class Representatives for Text Classification
ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Approximation schemes for a class of subset selection problems
Theoretical Computer Science
Unsupervised feature selection for principal components analysis
Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Adaptive sampling and fast low-rank matrix approximation
APPROX'06/RANDOM'06 Proceedings of the 9th international conference on Approximation Algorithms for Combinatorial Optimization Problems, and 10th international conference on Randomization and Computation
Hi-index | 0.00 |
Motivated by the enormous amounts of data collected in a large IT service provider organization, this paper presents a method for quickly and automatically summarizing and extracting meaningful insights from the data. Termed Clustered Subset Selection (CSS), our method enables program-guided data explorations of high-dimensional data matrices. CSS combines clustering and subset selection into a coherent and intuitive method for data analysis. In addition to a general framework, we introduce a family of CSS algorithms with different clustering components such as k-means and Close-to-Rank-One (CRO) clustering, and Subset Selection components such as best rank-one approximation and Rank-Revealing QR (RRQR) decomposition. From an empirical perspective, we illustrate that CSS is achieving significant improvements over existing Subset Selection methods in terms of approximation errors. Compared to existing Subset Selection techniques, CSS is also able to provide additional insight about clusters and cluster representatives. Finally, we present a case-study of program-guided data explorations using CSS on a large amount of IT service delivery data collection.