On the merits of building categorization systems by supervised clustering
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
ICML '02 Proceedings of the Nineteenth International Conference on Machine Learning
Clustering with Instance-level Constraints
ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
A multilingual news summarizer
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Cluster-based retrieval using language models
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Restrictive clustering and metaclustering for self-organizing document collections
Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual and cross-lingual news topic tracking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A Latent Semantic Indexing-based approach to multilingual document clustering
Decision Support Systems
IJCAI'03 Proceedings of the 18th international joint conference on Artificial intelligence
Cross language text categorization by acquiring multilingual domain models from comparable corpora
ParaText '05 Proceedings of the ACL Workshop on Building and Using Parallel Texts
Cross-lingual sentiment classification via bi-view non-negative matrix tri-factorization
PAKDD'11 Proceedings of the 15th Pacific-Asia conference on Advances in knowledge discovery and data mining - Volume Part I
Distributed approximate spectral clustering for large-scale datasets
Proceedings of the 21st international symposium on High-Performance Parallel and Distributed Computing
Hi-index | 0.00 |
We present a novel approach for multilingual document clustering using only comparable corpora to achieve cross-lingual semantic interoperability. The method models document collections as weighted graph, and supervisory information is given as sets of must-linked constraints for documents in different languages. Recursive k-nearest neighbor similarity propagation is used to exploit the prior knowledge and merge two language spaces. Spectral method is applied to find the best cuts of the graph. Experimental results show that using limited supervisory information, our method achieves promising clustering results. Furthermore, since the method does not need any language dependent information in the process, our algorithm can be applied to languages in various alphabetical systems.