PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Authors:
Mehmet Koyutürk;Ananth Grama
Affiliations:
West Lafayette, IN;West Lafayette, IN
Venue:
Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Year:
2003

Citing 14
Cited 5

Using linear algebra for intelligent information retrieval

SIAM Review
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Algorithm 805: computation and uses of the semidiscrete matrix decomposition

ACM Transactions on Mathematical Software (TOMS)
The Centroid Decomposition: Relationships between Discrete Variational Decompositions and SVDs

SIAM Journal on Matrix Analysis and Applications
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Algebraic Techniques for Analysis of Large Discrete-Valued Datasets

PKDD '02 Proceedings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Semi-discrete Matrix Transforms (SDD) for Image and Video Compression

DCC '02 Proceedings of the Data Compression Conference
Evaluation of Sampling for Data Mining of Association Rules

Evaluation of Sampling for Data Mining of Association Rules

Compression, Clustering, and Pattern Discovery in Very High-Dimensional Discrete-Attribute Data Sets

IEEE Transactions on Knowledge and Data Engineering
Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

ACM Transactions on Mathematical Software (TOMS)
Clicks: An effective algorithm for mining subspace clusters in categorical datasets

Data & Knowledge Engineering
Mining discrete patterns via binary matrix factorization

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
An optimization framework for role mining

Journal of Computer Security

Quantified Score

Hi-index	0.00

Visualization

Abstract

This paper presents an efficient framework for error-bounded compression of high-dimensional discrete attributed datasets. Such datasets, which frequently arise in a wide variety of applications, pose some of the most significant challenges in data analysis. Subsampling and compression are two key technologies for analyzing these datasets. PROXIMUS provides a technique for reducing large datasets into a much smaller set of representative patterns, on which traditional (expensive) analysis algorithms can be applied with minimal loss of accuracy. We show desirable properties of PROXIMUS in terms of runtime, scalability to large datasets, and performance in terms of capability to represent data in a compact form. We also demonstrate applications of PROXIMUS in association rule mining. In doing so, we establish PROXIMUS as a tool for preprocessing data before applying computationally expensive algorithms or as a tool for directly extracting correlated patterns. Our experimental results show that use of the compressed data for association rule mining provides excellent precision and recall values (near 100%) across a range of support thresholds while reducing the time required for association rule mining drastically.