Design and evaluation of an ir-benchmark for sparql queries with fulltext conditions

  • Authors:
  • Arunav Mishra;Sairam Gurajada;Martin Theobald

  • Affiliations:
  • Max Planck Insitute Informatics, Saarbruecken, Germany;Max Planck Insitute Informatics, Saarbruecken, Germany;Max Planck Insitute Informatics, Saarbruecken, Germany

  • Venue:
  • Proceedings of the fifth workshop on Exploiting semantic annotations in information retrieval
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

In this paper, we describe our goals in introducing a new, annotated benchmark collection, with which we aim to bridge the gap between the fundamentally different aspects that are involved in querying both structured and unstructured data. This semantically rich collection, captured in a unified XML format, combines components (unstructured text, semistructured infoboxes, and category structure) from 3.1 Million Wikipedia articles with highly structured RDF properties from both DBpedia and YAGO2. The new collection serves as the basis of the INEX 2012 Ad-hoc, Faceted Search, and Jeopardy retrieval tasks. With a focus on the new Jeopardy task, we particularly motivate the usage of the collection for question-answering (QA) style retrieval settings, which we also exemplify by introducing a set of 90 QA-style benchmark queries which come shipped in a SPARQL-based query format that has been extended by fulltext filter conditions.