On context-aware co-clustering with metadata support

Authors:
Claudio Schifanella;Maria Luisa Sapino;K. Selçuk Candan
Affiliations:
University of Torino, Torino, Italy 10149;University of Torino, Torino, Italy 10149;Arizona State University, Tempe, USA 85281
Venue:
Journal of Intelligent Information Systems
Year:
2012

Citing 35
Cited 1

Co-clustering documents and words using bipartite spectral graph partitioning

Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining
On similarity measures for multimedia database applications

Knowledge and Information Systems
Design, Implementation and Evaluation of SCORE (a System for COntent based REtrieval of Pictures)

ICDE '95 Proceedings of the Eleventh International Conference on Data Engineering
Constrained K-means Clustering with Background Knowledge

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Semi-supervised Clustering by Seeding

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Biclustering of Expression Data

Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology
Dissimilarity Measure for Collections of Objects and Values

IDA '97 Proceedings of the Second International Symposium on Advances in Intelligent Data Analysis, Reasoning about Data
Document clustering based on non-negative matrix factorization

Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval
Adaptive duplicate detection using learnable string similarity measures

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Information-theoretic co-clustering

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining
Word clustering and disambiguation based on co-occurrence data

COLING '98 Proceedings of the 17th international conference on Computational linguistics - Volume 2
A probabilistic framework for semi-supervised clustering

Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
Integrating constraints and metric learning in semi-supervised clustering

ICML '04 Proceedings of the twenty-first international conference on Machine learning
Biclustering Algorithms for Biological Data Analysis: A Survey

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Consistent bipartite graph co-partitioning for star-structured high-order heterogeneous data co-clustering

Proceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data mining
A Scalable Collaborative Filtering Framework Based on Co-Clustering

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Spectral clustering for multi-type relational data

ICML '06 Proceedings of the 23rd international conference on Machine learning
Pattern Recognition and Machine Learning (Information Science and Statistics)

Pattern Recognition and Machine Learning (Information Science and Statistics)
CP/CV: concept similarity mining without frequency information from domain describing taxonomies

CIKM '06 Proceedings of the 15th ACM international conference on Information and knowledge management
Star-Structured High-Order Heterogeneous Data Co-clustering Based on Consistent Information Theory

ICDM '06 Proceedings of the Sixth International Conference on Data Mining
A probabilistic framework for relational clustering

Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining
A tutorial on spectral clustering

Statistics and Computing
A Generalized Maximum Entropy Approach to Bregman Co-clustering and Matrix Approximation

The Journal of Machine Learning Research
Clustering Trees with Instance Level Constraints

ECML '07 Proceedings of the 18th European conference on Machine Learning
A matrix-based approach for semi-supervised document co-clustering

Proceedings of the 17th ACM conference on Information and knowledge management
Bayesian Co-clustering

ICDM '08 Proceedings of the 2008 Eighth IEEE International Conference on Data Mining
C-DBSCAN: Density-Based Clustering with Constraints

RSFDGrC '07 Proceedings of the 11th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing
Learning systems of concepts with an infinite relational model

AAAI'06 Proceedings of the 21st national conference on Artificial intelligence - Volume 1
Latent class models for collaborative filtering

IJCAI'99 Proceedings of the 16th international joint conference on Artificial intelligence - Volume 2
Image co-clustering with multi-modality features and user feedbacks

MM '09 Proceedings of the 17th ACM international conference on Multimedia
CoSeNa: a context-based search and navigation system

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
Tensor Decompositions and Applications

SIAM Review
Non-Negative Matrix Factorization for Semisupervised Heterogeneous Data Coclustering

IEEE Transactions on Knowledge and Data Engineering
Orthogonal nonnegative matrix tri-factorization for semi-supervised document co-clustering

PAKDD'10 Proceedings of the 14th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining - Volume Part II

RanKloud: scalable multimedia and social media retrieval and analysis in the cloud

Proceedings of the 9th workshop on Large-scale and distributed informational retrieval

Quantified Score

Hi-index	0.00

Visualization

Abstract

In traditional co-clustering, the only basis for the clustering task is a given relationship matrix, describing the strengths of the relationships between pairs of elements in the different domains. Relying on this single input matrix, co-clustering discovers relationships holding among groups of elements from the two input domains. In many real life applications, on the other hand, other background knowledge or metadata about one or more of the two input domain dimensions may be available and, if leveraged properly, such metadata might play a significant role in the effectiveness of the co-clustering process. How additional metadata affects co-clustering, however, depends on how the process is modified to be context-aware. In this paper, we propose, compare, and evaluate three alternative strategies (metadata-driven, metadata-constrained, and metadata-injected co-clustering) for embedding available contextual knowledge into the co-clustering process. Experimental results show that it is possible to leverage the available metadata in discovering contextually-relevant co-clusters, without significant overheads in terms of information theoretical co-cluster quality or execution cost.