A general model for clustering binary data

Authors:
Tao Li
Affiliations:
Florida International University, Miami, FL
Venue:
Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
Year:
2005

Citing 17
Cited 28

Algorithms for clustering data

Algorithms for clustering data
Automatic subspace clustering of high dimensional data for data mining applications

SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
WebACE: a Web agent for document categorization and exploration

AGENTS '98 Proceedings of the second international conference on Autonomous agents
Fast algorithms for projected clustering

SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Authoritative sources in a hyperlinked environment

Journal of the ACM (JACM)
Document Categorization and Query Generation on the World Wide WebUsing WebACE

Artificial Intelligence Review - Special issue on data mining on the Internet
Document clustering using word clusters via the information bottleneck method

SIGIR '00 Proceedings of the 23rd annual international ACM SIGIR conference on Research and development in information retrieval
Concept decompositions for large sparse text data using clustering

Machine Learning
Clustering Algorithms

Clustering Algorithms
Evaluation of hierarchical clustering algorithms for document datasets

Proceedings of the eleventh international conference on Information and knowledge management
Locally Adaptive Metric Nearest-Neighbor Classification

IEEE Transactions on Pattern Analysis and Machine Intelligence
CoFD: An Algorithm for Non-distance Based Clustering in High Dimensional Spaces

DaWaK 2000 Proceedings of the 4th International Conference on Data Warehousing and Knowledge Discovery
Adaptive dimension reduction for clustering high dimensional data

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Document clustering by concept factorization

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Entropy-based criterion in categorical clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning

A Unified View on Clustering Binary Data

Machine Learning
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Orthogonal nonnegative matrix t-factorizations for clustering

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Unsupervised learning on k-partite graphs

Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
Relational clustering by symmetric convex coding

Proceedings of the 24th international conference on Machine learning
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
Networkmd: topology inference and failure diagnosis in the last mile

Proceedings of the 7th ACM SIGCOMM conference on Internet measurement
On the equivalence between Non-negative Matrix Factorization and Probabilistic Latent Semantic Indexing

Computational Statistics & Data Analysis
CRD: fast co-clustering on large datasets utilizing sampling-based matrix decomposition

Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Knowledge transformation from word space to document space

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Multi-document summarization via sentence-level semantic analysis and symmetric matrix factorization

Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Succinct summarization of transactional databases: an overlapped hyperrectangle scheme

Proceedings of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining
Clustering based on matrix approximation: a unifying view

Knowledge and Information Systems
Mining non-redundant high order correlations in binary data

Proceedings of the VLDB Endowment
A Statistical Approach for Binary Vectors Modeling and Clustering

PAKDD '09 Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery and Data Mining
Online pairing of VoIP conversations

The VLDB Journal — The International Journal on Very Large Data Bases
Nonnegative matrix factorization and probabilistic latent semantic indexing: equivalence, chi-square statistic, and a hybrid method

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
On multivariate binary data clustering and feature weighting

Computational Statistics & Data Analysis
Binary matrix factorization for analyzing gene expression data

Data Mining and Knowledge Discovery
A hybrid multicast-unicast infrastructure for efficient publish-subscribe in enterprise networks

Proceedings of the 3rd Annual Haifa Experimental Systems Conference
On combining multiple clusterings: an overview and a new perspective

Applied Intelligence
Discovering Knowledge-Sharing Communities in Question-Answering Forums

ACM Transactions on Knowledge Discovery from Data (TKDD)
A Clustering-Driven LDAP Framework

ACM Transactions on the Web (TWEB)
Integrating Document Clustering and Multidocument Summarization

ACM Transactions on Knowledge Discovery from Data (TKDD)
Summarizing transactional databases with overlapped hyperrectangles

Data Mining and Knowledge Discovery
A practical approach for clustering transaction data

MLDM'11 Proceedings of the 7th international conference on Machine learning and data mining in pattern recognition
Detecting communities in K-partite K-uniform (hyper)networks

Journal of Computer Science and Technology - Special issue on Community Analysis and Information Recommendation
Multi-Label Classification Method for Multimedia Tagging

International Journal of Multimedia Data Engineering & Management

Quantified Score

Hi-index	0.00

Visualization

Abstract

Clustering is the problem of identifying the distribution of patterns and intrinsic correlations in large data sets by partitioning the data points into similarity classes. This paper studies the problem of clustering binary data. This is the case for market basket datasets where the transactions contain items and for document datasets where the documents contain "bag of words". The contribution of the paper is three-fold. First a general binary data clustering model is presented. The model treats the data and features equally, based on their symmetric association relations, and explicitly describes the data assignments as well as feature assignments. We characterize several variations with different optimization procedures for the general model. Second, we also establish the connections between our clustering model with other existing clustering methods. Third, we also discuss the problem for determining the number of clusters for binary clustering. Experimental results show the effectiveness of the proposed clustering model.