On supporting efficient updates of regular expression indexes in RDF databases

  • Authors:
  • Jinsoo Lee;Romans Kasperovics;Wook-Shin Han;Hune Cho

  • Affiliations:
  • Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea;Kyungpook National University, Daegu, South Korea

  • Venue:
  • Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Resource Description Framework (RDF) is widely used for sharing biomedical resources, such as the online protein database UniProt or gene database GeneOntology. SPARQL is the native query language for RDF databases and it features regular expressions in queries for which the exact values are either irrelevant or unknown. A recent paper by Lee et al. presented an efficient indexing support for such queries adopting multigram indexes for regular expressions. In this paper we contribute to their work by addressing index updates. As a result, we identify a major performance problem of straightforward implementations and design a new algorithm that utilizes unique properties of multigram indexes. Our contributions can be summarized as follows: 1) we propose an efficient update algorithm for regular expression indexes in RDF databases; 2) we build a prototype system for the proposed framework in C++; 3) we conduct extensive experiments to demonstrate the properties of our algorithm. The experiments show that our algorithm outperforms the straightforward implementations by an order of magnitude.