German encyclopedia alignment based on information retrieval techniques

  • Authors:
  • Roman Kern;Michael Granitzer

  • Affiliations:
  • Know-Center, Graz;Know-Center, Graz and Graz University of Technology

  • Venue:
  • ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
  • Year:
  • 2010

Quantified Score

Hi-index 0.00

Visualization

Abstract

Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.