Parameter-less co-clustering for star-structured heterogeneous data

Authors:
Dino Ienco;Céline Robardet;Ruggero G. Pensa;Rosa Meo
Affiliations:
Department of Computer Science, University of Torino, Torino, Italy 10139 and IRSTEA Montpellier, UMR TETIS, Montpellier, France 34093;Université de Lyon, CNRS, INSA-Lyon, LIRIS UMR5205, Villeurbanne, France 69621;Department of Computer Science, University of Torino, Torino, Italy 10139;Department of Computer Science, University of Torino, Torino, Italy 10139
Venue:
Data Mining and Knowledge Discovery
Year:
2013

Citing 23
Cited 2

Efficient Local Search in Conceptual Clustering

DS '01 Proceedings of the 4th International Conference on Discovery Science
Comparison of Three Objective Functions for Conceptual Clustering

PKDD '01 Proceedings of the 5th European Conference on Principles of Data Mining and Knowledge Discovery
Cluster ensembles --- a knowledge reuse framework for combining multiple partitions

The Journal of Machine Learning Research
An extensive empirical study of feature selection metrics for text classification

The Journal of Machine Learning Research
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Fully automatic cross-associations

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Towards parameter-free data mining

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Multi-View Clustering

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Star-Structured High-Order Heterogeneous Data Co-clustering Based on Consistent Information Theory

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
Stochastic Local Search Algorithms for Multiobjective Combinatorial Optimizations: Methods and Analysis

Stochastic Local Search Algorithms for Multiobjective Combinatorial Optimizations: Methods and Analysis
Statistical Comparisons of Classifiers over Multiple Data Sets

The Journal of Machine Learning Research
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Approximation algorithms for co-clustering

Proceedings of the twenty-seventh ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
Clustering the tagged web

Proceedings of the Second ACM International Conference on Web Search and Data Mining
Handbook of Parametric and Nonparametric Statistical Procedures

Handbook of Parametric and Nonparametric Statistical Procedures
Semi-supervised Document Clustering with Simultaneous Text Representation and Categorization

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
Parameter-Free Hierarchical Co-clustering by n-Ary Splits

ECML PKDD '09 Proceedings of the European Conference on Machine Learning and Knowledge Discovery in Databases: Part I
CoFKM: A Centralized Method for Multiple-View Clustering

ICDM '09 Proceedings of the 2009 Ninth IEEE International Conference on Data Mining
Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering

IEEE Transactions on Knowledge and Data Engineering
Coclustering Multiple Heterogeneous Domains: Linear Combinations and Agreements

IEEE Transactions on Knowledge and Data Engineering
On dominance-based multiobjective local search: design, implementation and experimental analysis on scheduling and traveling salesman problems

Journal of Heuristics

SNOPS: a smart environment for cultural heritage applications

Proceedings of the twelfth international workshop on Web information and data management
Hierarchical co-clustering: off-line and incremental approaches

Data Mining and Knowledge Discovery

Quantified Score

Hi-index	0.00

Visualization

Abstract

The availability of data represented with multiple features coming from heterogeneous domains is getting more and more common in real world applications. Such data represent objects of a certain type, connected to other types of data, the features, so that the overall data schema forms a star structure of inter-relationships. Co-clustering these data involves the specification of many parameters, such as the number of clusters for the object dimension and for all the features domains. In this paper we present a novel co-clustering algorithm for heterogeneous star-structured data that is parameter-less. This means that it does not require either the number of row clusters or the number of column clusters for the given feature spaces. Our approach optimizes the Goodman---Kruskal's 驴, a measure for cross-association in contingency tables that evaluates the strength of the relationship between two categorical variables. We extend 驴 to evaluate co-clustering solutions and in particular we apply it in a higher dimensional setting. We propose the algorithm CoStar which optimizes 驴 by a local search approach. We assess the performance of CoStar on publicly available datasets from the textual and image domains using objective external criteria. The results show that our approach outperforms state-of-the-art methods for the co-clustering of heterogeneous data, while it remains computationally efficient.