Multilingual spectral clustering using document similarity propagation

Authors:
Dani Yogatama;Kumiko Tanaka-Ishii
Affiliations:
University of Tokyo, Chiyoda-ku, Tokyo, Japan;University of Tokyo, Chiyoda-ku, Tokyo, Japan
Venue:
EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
Year:
2009

Citing 10
Cited 2

On the merits of building categorization systems by supervised clustering

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
From Instance-level Constraints to Space-Level Constraints: Making the Most of Prior Knowledge in Data Clustering

ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A multilingual news summarizer

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Cluster-based retrieval using language models

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Restrictive clustering and metaclustering for self-organizing document collections

Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual and cross-lingual news topic tracking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A Latent Semantic Indexing-based approach to multilingual document clustering

Decision Support Systems
Spectral learning

IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Cross language text categorization by acquiring multilingual domain models from comparable corpora

ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts

Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization

PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Distributed approximate spectral clustering for large-scale datasets

Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing

Quantified Score

Hi-index	0.00

Visualization

Abstract

We present a novel approach for multilingual document clustering using only comparable corpora to achieve cross-lingual semantic interoperability. The method models document collections as weighted graph, and supervisory information is given as sets of must-linked constraints for documents in different languages. Recursive k-nearest neighbor similarity propagation is used to exploit the prior knowledge and merge two language spaces. Spectral method is applied to find the best cuts of the graph. Experimental results show that using limited supervisory information, our method achieves promising clustering results. Furthermore, since the method does not need any language dependent information in the process, our algorithm can be applied to languages in various alphabetical systems.