Block clustering of contingency table and mixture model

Authors:
Mohamed Nadif;Gérard Govaert
Affiliations:
LITA EA3097, Université de Metz, Metz, France;HEUDIASYC, UMR CNRS 6599, Université de Technologie de Compiègne, Compiègne Cedex, France
Venue:
IDA'05 Proceedings of the 6th international conference on Advances in Intelligent Data Analysis
Year:
2005

Citing 5
Cited 3

A Classification EM algorithm for clustering and two stochastic versions

Computational Statistics & Data Analysis - Special issue on optimization techniques in statistics
Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
An EM Algorithm for the Block Mixture Model

IEEE Transactions on Pattern Analysis and Machine Intelligence

Simultaneous clustering: a survey

PReMI'11 Proceedings of the 4th international conference on Pattern recognition and machine intelligence
How to "alternatize" a clustering algorithm

Data Mining and Knowledge Discovery
Multivariate methods using mixtures: Correspondence analysis, scaling and pattern-detection

Computational Statistics & Data Analysis

Quantified Score

Hi-index	0.00

Visualization

Abstract

Block clustering or simultaneous clustering has become an important challenge in data mining context. It has practical importance in a wide of variety of applications such as text, web-log and market basket data analysis. Typically, the data that arises in these applications is arranged as a two-way contingency or co-occurrence table. In this paper, we embed the block clustering problem in the mixture approach. We propose a Poisson block mixture model and adopting the classification maximum likelihood principle we perform a new algorithm. Simplicity, fast convergence and scalability are the major advantages of the proposed approach.