An evaluation framework for cross-lingual link discovery

  • Authors:
  • Ling-Xiang Tang;Shlomo Geva;Andrew Trotman;Yue Xu;Kelly Y. Itakura

  • Affiliations:
  • Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia;Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia;Department of Computer Science, University of Otago, Dunedin, New Zealand;Science and Engineering Faculty, Queensland University of Technology, Brisbane, Australia;National Institute of Informatics, Japan

  • Venue:
  • Information Processing and Management: an International Journal
  • Year:
  • 2014

Quantified Score

Hi-index 0.00

Visualization

Abstract

Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.