An experimental comparison of model-based clustering methods
Machine Learning
Clustering Algorithms
A multilingual news summarizer
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Word sense disambiguation vs. statistical machine translation
ACL '05 Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics
Columbia Newsblaster: multilingual news summarization on the web
HLT-NAACL--Demonstrations '04 Demonstration Papers at HLT-NAACL 2004
Double-pass clustering technique for multilingual document collections
Journal of Information Science
Hi-index | 0.00 |
The ever-increasing numbers of Web-accessible documents are available in languages other than English. The management of these heterogeneous document collections has posed a challenge. This paper proposes a novel model, called a domain alignment translation model, to conduct cross-lingual document clustering. While most existing cross-lingual document clustering methods make use of an expensive machine translation system to fill the gap between two languages, our model aims to effectively handle the cross-lingual document clustering by learning a cross-lingual domain alignment model and a domain-specific term translation model in a collaborative way. Experimental results show our method, i.e. C-TLS, without any resources other than a bilingual dictionary can achieve comparable performance to the direct machine translation method via a machine translation system, e.g. Google language tool. Also, our method is more efficient.