Nonorthogonal decomposition of binary matrices for bounded-error data compression and analysis

Authors:
Mehmet Koyutürk;Ananth Grama;Naren Ramakrishnan
Affiliations:
Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Purdue University, West Lafayette, IN;Department of Computer Sciences, Virginia Tech., Blackburg, VA
Venue:
ACM Transactions on Mathematical Software (TOMS)
Year:
2006

Citing 23
Cited 8

Direct methods for sparse matrices

Direct methods for sparse matrices
Elements of information theory

Elements of information theory
Using linear algebra for intelligent information retrieval

SIAM Review
A semidiscrete matrix decomposition for latent semantic indexing information retrieval

ACM Transactions on Information Systems (TOIS)
A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs

SIAM Journal on Scientific Computing
ROCK: a robust clustering algorithm for categorical attributes

Information Systems
Algorithm 805: computation and uses of the semidiscrete matrix decomposition

ACM Transactions on Mathematical Software (TOMS)
Algorithms for association rule mining — a general survey and comparison

ACM SIGKDD Explorations Newsletter
Algorithm 457: finding all cliques of an undirected graph

Communications of the ACM
Introduction to Scientific Computing: A Matrix-Vector Approach Using MATLAB

Introduction to Scientific Computing: A Matrix-Vector Approach Using MATLAB
The Centroid Decomposition: Relationships between Discrete Variational Decompositions and SVDs

SIAM Journal on Matrix Analysis and Applications
Principal Direction Divisive Partitioning

Data Mining and Knowledge Discovery
A Survey of Methods for Scaling Up Inductive Algorithms

Data Mining and Knowledge Discovery
Clustering Categorical Data: An Approach Based on Dynamical Systems

VLDB '98 Proceedings of the 24rd International Conference on Very Large Data Bases
Fast Algorithms for Mining Association Rules in Large Databases

VLDB '94 Proceedings of the 20th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Iterative Methods for Sparse Linear Systems

Iterative Methods for Sparse Linear Systems
Semi-discrete Matrix Transforms (SDD) for Image and Video Compression

DCC '02 Proceedings of the Data Compression Conference
Evaluation of Sampling for Data Mining of Association Rules

Evaluation of Sampling for Data Mining of Association Rules
Algorithms for Bounded-Error Correlation of High Dimensional Data in Microarray Experiments

CSB '03 Proceedings of the IEEE Computer Society Conference on Bioinformatics
PROXIMUS: a framework for analyzing very high dimensional discrete-attributed datasets

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
The maximum edge biclique problem is NP-complete

Discrete Applied Mathematics
Hypergraph Models and Algorithms for Data-Pattern-Based Clustering

Data Mining and Knowledge Discovery

Semantic indexing in structured peer-to-peer networks

Journal of Parallel and Distributed Computing
Pattern Discovery for High-Dimensional Binary Datasets

Neural Information Processing
Implementing Boolean Matrix Factorization

ICANN '08 Proceedings of the 18th international conference on Artificial Neural Networks, Part I
Mining discrete patterns via binary matrix factorization

Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining
Binary matrix factorization for analyzing gene expression data

Data Mining and Knowledge Discovery
Mining roles with noisy data

Proceedings of the 15th ACM symposium on Access control models and technologies
Fast orthogonal nonnegative matrix tri-factorization for simultaneous clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II
Discrete Eckart-Young Theorem for Integer Matrices

SIAM Journal on Matrix Analysis and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

This article presents the design and implementation of a software tool, PROXIMUS, for error-bounded approximation of high-dimensional binary attributed datasets based on nonorthogonal decomposition of binary matrices. This tool can be used for analyzing data arising in a variety of domains ranging from commercial to scientific applications. Using a combination of innovative algorithms, novel data structures, and efficient implementation, PROXIMUS demonstrates excellent accuracy, performance, and scalability to large datasets. We experimentally demonstrate these on diverse applications in association rule mining and DNA microarray analysis. In limited beta release, PROXIMUS currently has over 300 installations in over 10 countries.