Metadata Extraction from Books with Facts about Austria

  • Authors:
  • Petra Korica-Pehserl;Hermann Maurer

  • Affiliations:
  • Institute for Information Systems and Computer Media, Inffeldgasse 16c, 8010 Graz, Austria;Institute for Information Systems and Computer Media, Inffeldgasse 16c, 8010 Graz, Austria

  • Venue:
  • Proceedings of International Conference on Information Integration and Web-based Applications & Services
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

Digitized fact books are valuable sources of knowledge. Full-text search is a powerful tool to access such knowledge. However, it often delivers too many results for general queries. Therefore we propose an approach to find relevant data by extracting metadata relevant for each page and allow to search for pages on the basis of their metadata as alternative to full-text search. Given the size of scanned data (high quality image scans) clearly this extraction cannot be done manually. As it turns out, although there are some common aspects, different books often need to be treated differently. In particular we can distinguish two kinds of books: lexicons (dictionaries) where items are arranged alphabetically and other books that describe various topics in a more narrative style. In this paper we describe the approach we used on different fact books in detail and share our learnings from this subject.