A semidiscrete matrix decomposition for latent semantic indexing information retrieval
ACM Transactions on Information Systems (TOIS)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs
SIAM Journal on Scientific Computing
ROCK: a robust clustering algorithm for categorical attributes
Information Systems
Algorithm 805: computation and uses of the semidiscrete matrix decomposition
ACM Transactions on Mathematical Software (TOMS)
The Centroid Decomposition: Relationships between Discrete Variational Decompositions and SVDs
SIAM Journal on Matrix Analysis and Applications
Principal Direction Divisive Partitioning
Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms
Data Mining and Knowledge Discovery
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets
PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems
VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases
VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Semi-discrete Matrix Transforms (SDD) for Image and Video Compression
DCC '02 Proceedings of the Data Compression Conference
Evaluation of Sampling for Data Mining of Association Rules
Evaluation of Sampling for Data Mining of Association Rules
Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets
IEEE Transactions on Knowledge and Data Engineering
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis
ACM Transactions on Mathematical Software (TOMS)
Clicks: An effective algorithm for mining subspace clusters in categorical datasets
Data & Knowledge Engineering
Mining discrete patterns via binary matrix factorization
Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An optimization framework for role mining
Journal of Computer Security
Hi-index | 0.00 |
This paper presents an efficient framework for error-bounded compression of high-dimensional discrete attributed datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Subsampling and compression are two key technologies for analyzing these datasets. PROXIMUS provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large datasets, and performance in terms of capability to represent data in a compact form. We also demonstrate applications of PROXIMUS in association rule mining. In doing so, we establish PROXIMUS as a tool for preprocessing data before applying computationally expensive algorithms or as a tool for directly extracting correlated patterns. Our experimental results show that use of the compressed data for association rule mining provides excellent precision and recall values (near 100%) across a range of support thresholds while reducing the time required for association rule mining drastically.