Multilingual and cross-lingual news topic tracking

Authors:
Bruno Pouliquen;Ralf Steinberger;Camelia Ignat;Emilia Käsper;Irina Temnikova
Affiliations:
Joint Research Centre, Ispra (VA), Italy;Joint Research Centre, Ispra (VA), Italy;Joint Research Centre, Ispra (VA), Italy;Joint Research Centre, Ispra (VA), Italy;Joint Research Centre, Ispra (VA), Italy
Venue:
COLING '04 Proceedings of the 20th international conference on Computational Linguistics
Year:
2004

Citing 4
Cited 14

On-line new event detection and tracking

Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Data clustering: a review

ACM Computing Surveys (CSUR)
Research to Improve Cross-Language Retrieval - Position Paper for CLEF

CLEF '00 Revised Papers from the Workshop of Cross-Language Evaluation Forum on Cross-Language Information Retrieval and Evaluation
Geographical information recognition and visualization in texts written in various languages

Proceedings of the 2004 ACM symposium on Applied computing

Multilingual document clustering: an heuristic approach based on cognate named entities

ACL-44 Proceedings of the 21st International Conference on Computational Linguistics and the 44th annual meeting of the Association for Computational Linguistics
Multilingual news clustering: Feature translation vs. identification of cognate named entities

Pattern Recognition Letters
A Latent Semantic Indexing-based approach to multilingual document clustering

Decision Support Systems
Similarity of Names Across Scripts: Edit Distance Using Learned Costs of N-Grams

GoTAL '08 Proceedings of the 6th international conference on Advances in Natural Language Processing
Feature-based method for document alignment in comparable news corpora

EACL '09 Proceedings of the 12th Conference of the European Chapter of the Association for Computational Linguistics
Story tracking: linking similar news over time and across languages

MMIES '08 Proceedings of the Workshop on Multi-source Multilingual Information Extraction and Summarization
Multilingual spectral clustering using document similarity propagation

EMNLP '09 Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing: Volume 2 - Volume 2
New event detection and topic tracking in Turkish

Journal of the American Society for Information Science and Technology
Bilingual news clustering using named entities and fuzzy similarity

TSD'07 Proceedings of the 10th international conference on Text, speech and dialogue
Measuring Chinese-English cross-lingual word similarity with HowNet and parallel corpus

CICLing'11 Proceedings of the 12th international conference on Computational linguistics and intelligent text processing - Volume Part II
Disambiguating entity references within an ontological model

Proceedings of the International Conference on Web Intelligence, Mining and Semantics
Double-pass clustering technique for multilingual document collections

Journal of Information Science
Multilingual news document clustering: two algorithms based on cognate named entities

TSD'06 Proceedings of the 9th international conference on Text, Speech and Dialogue
Entity reference resolution via spreading activation on RDF-Graphs

ESWC'10 Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I

Quantified Score

Hi-index	0.00

Visualization

Abstract

We are presenting a working system for automated news analysis that ingests an average total of 7600 news articles per day in five languages. For each language, the system detects the major news stories of the day using a group-average unsupervised agglomerative clustering process. It also tracks, for each cluster, related groups of articles published over the previous seven days, using a cosine of weighted terms. The system furthermore tracks related news across languages, in all language pairs involved. The cross-lingual news cluster similarity is based on a linear combination of three types of input: (a) cognates, (b) automatically detected to geographical place names and (c) the results of a mapping process onto a multilingual classification system. A manual evaluation showed that the system produces good results.