Non-hierarchical document clustering using the ICL distribution array processor
SIGIR '87 Proceedings of the 10th annual international ACM SIGIR conference on Research and development in information retrieval
Information retrieval
Improving query translation for cross-language information retrieval using statistical models
Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval
Document clustering with cluster refinement and model selection capabilities
SIGIR '02 Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval
Multilingual Document Clustering, Topic Extraction and Data Transformations
EPIA '01 Proceedings of the10th Portuguese Conference on Artificial Intelligence on Progress in Artificial Intelligence, Knowledge Extraction, Multi-agent Systems, Logic Programming and Constraint Solving
CLEF Experiments at Maryland: Statistical Stemming and Backoff Translation
CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Introduction to topic detection and tracking
Topic detection and tracking
Probabilistic approaches to topic detection and tracking
Topic detection and tracking
An NLP & IR approach to topic detection
Topic detection and tracking
Pattern Classification (2nd Edition)
Pattern Classification (2nd Edition)
A multilingual news summarizer
COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
RCV1: A New Benchmark Collection for Text Categorization Research
The Journal of Machine Learning Research
Technical issues of cross-language information retrieval: a review
Information Processing and Management: an International Journal - Special issue: Cross-language information retrieval
Improved cross-language retrieval using backoff translation
HLT '01 Proceedings of the first international conference on Human language technology research
Multilingual and cross-lingual news topic tracking
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Information Processing and Management: an International Journal
Multilingual news clustering: Feature translation vs. identification of cognate named entities
Pattern Recognition Letters
A Latent Semantic Indexing-based approach to multilingual document clustering
Decision Support Systems
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Cross-lingual document clustering
PAKDD'07 Proceedings of the 11th Pacific-Asia conference on Advances in knowledge discovery and data mining
High-speed rough clustering for very large document collections
Journal of the American Society for Information Science and Technology
Cross-Language Information Retrieval
Cross-Language Information Retrieval
A neural network model for hierarchical multilingual text categorization
ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II
Probability-based text clustering algorithm by alternately repeating two operations
Journal of Information Science
Cross-language patent matching via an international patent classification-based concept bridge
Journal of Information Science
Hi-index | 0.00 |
It is often necessary to categorize automatically multilingual document sets, in which documents written in a variety of languages are included, into topically homogeneous subsets, such as when applying an automatic summarization system for multilingual news articles. However, there have been few studies on multilingual document clustering to date. In particular, it is not known whether clustering techniques are effective in medium- or large-scale multilingual document sets. For scalability, techniques should be based on dictionary-based translation and a single- or double-pass clustering algorithm. This article reports on experiments of applying multilingual document clustering to medium-scale sets of English, French, German and Italian documents (Reuters news articles). The results show that the double-pass algorithm has a positive effect in the case that each document is translated. On the other hand, the cluster translation strategy in which clusters obtained by applying a clustering algorithm to each language document set are translated has almost no effect. Also, translation disambiguation techniques can improve, but only slightly, the effectiveness of clustering.