German encyclopedia alignment based on information retrieval techniques

Authors:
Roman Kern;Michael Granitzer
Affiliations:
Know-Center, Graz;Know-Center, Graz and Graz University of Technology
Venue:
ECDL'10 Proceedings of the 14th European conference on Research and advanced technology for digital libraries
Year:
2010

Citing 7
Cited 2

An exploration of axiomatic approaches to information retrieval

Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval
Similarity measures for tracking information flow

Proceedings of the 14th ACM international conference on Information and knowledge management
A web-based kernel function for measuring the similarity of short text snippets

Proceedings of the 15th international conference on World Wide Web
Sentence Similarity Based on Semantic Nets and Corpus Statistics

IEEE Transactions on Knowledge and Data Engineering
Improving similarity measures for short segments of text

AAAI'07 Proceedings of the 22nd national conference on Artificial intelligence - Volume 2
Efficient linear text segmentation based on information retrieval techniques

Proceedings of the International Conference on Management of Emergent Digital EcoSystems
A comparative study of two short text semantic similarity measures

KES-AMSTA'08 Proceedings of the 2nd KES International conference on Agent and multi-agent systems: technologies and applications

Linking archives using document enrichment and term selection

TPDL'11 Proceedings of the 15th international conference on Theory and practice of digital libraries: research and advanced technology for digital libraries
Discovering links between political debates and media

ICWE'13 Proceedings of the 13th international conference on Web Engineering

Quantified Score

Hi-index	0.00

Visualization

Abstract

Collaboratively created online encyclopedias have become increasingly popular. Especially in terms of completeness they have begun to surpass their printed counterparts. Two German publishers of traditional encyclopedias have reacted to this challenge and decided to merge their corpora to create a single more complete encyclopedia. The crucial step in this merge process is the alignment of articles. We have developed a system to identify corresponding entries from different encyclopedic corpora. The base of our system is the alignment algorithm which incorporates various techniques developed in the field of information retrieval. We have evaluated the system on four real-world encyclopedias with a ground truth provided by domain experts. A combination of weighting and ranking techniques has been found to deliver a satisfying performance.