Multilingual news clustering: Feature translation vs. identification of cognate named entities

Authors:
S. Montalvo;R. Martínez;A. Casillas;V. Fresno
Affiliations:
Department of CC de la Computación, URJC, Spain;Department of Lenguajes y Sistemas Informáticos, UNED, Spain;Department of Electricidad y Electrónica, UPV-EHU, Spain;Department of CC de la Computación, URJC, Spain
Venue:
Pattern Recognition Letters
Year:
2007

Citing 7
Cited 2

Introduction to Modern Information Retrieval

Introduction to Modern Information Retrieval
Cross-Lingual Document Similarity Calculation Using the Multilingual Thesaurus EUROVOC

CICLing '02 Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing
Experiments with the Eurospider Retrieval System for CLEF 2001

CLEF '01 Revised Papers from the Second Workshop of the Cross-Language Evaluation Forum on Evaluation of Cross-Language Information Retrieval Systems
A multilingual news summarizer

COLING '00 Proceedings of the 18th conference on Computational linguistics - Volume 1
Multilingual document clustering: an heuristic approach based on cognate named entities

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multilingual and cross-lingual news topic tracking

COLING '04 Proceedings of the 20th international conference on Computational Linguistics
A neural network model for hierarchical multilingual text categorization

ISNN'05 Proceedings of the Second international conference on Advances in neural networks - Volume Part II

Relation Discovery from Thai News Articles Using Association Rule Mining

PAISI '09 Proceedings of the Pacific Asia Workshop on Intelligence and Security Informatics
Double-pass clustering technique for multilingual document collections

Journal of Information Science

Quantified Score

Hi-index	0.10

Visualization

Abstract

In this paper we evaluate the influence of different document representations in the results of multilingual news clustering. We aim at proving whether or not the use of only named entities is a good source of knowledge for multilingual news clustering. We compare two approaches: one based on feature translation, and another based on cognate identification. Our main contribution is using only some categories of cognate named entities like document representation features to perform multilingual news clustering, without the need of translation resources. The results show that the use of cognate named entities, as the only type of features to represent news, leads to good multilingual clustering performance, comparable to the one obtained by using the feature translation approach.