Implementing a linguistic query language for historic texts

  • Authors:
  • Lukas C. Faulstich;Ulf Leser;Thorsten Vitt

  • Affiliations:
  • Institut für Informatik, Humboldt-Universität zu Berlin, Berlin;Institut für Informatik, Humboldt-Universität zu Berlin, Berlin;Institut für Informatik, Humboldt-Universität zu Berlin, Berlin

  • Venue:
  • EDBT'06 Proceedings of the 2006 international conference on Current Trends in Database Technology
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

We describe the design and implementation of the linguistic query language DDDquery. This language aims at querying a large linguistic database storing a corpus of richly annotated historic German texts. We use a graph-based data model that supports multiple independent annotation layers on a shared text layer as well as alignments of text layers representing the same text or related texts (e.g., translations). The corpus is stored in an object-relational database system with a text retrieval extension. DDDquery is based on XPath to leverage the familiarity of many users with this language. It is translated to SQL in a two phase compilation with first order logic as an intermediate language. This approach effectively decouples the query language from the schema of the underlying corpus. We provide an overview of DDDquery, the underlying ODAG data model, its implementation as relational schema, the predicates of the intermediate language, and describe both phases of the translation process.