Optimization for dynamic inverted index maintenance
SIGIR '90 Proceedings of the 13th annual international ACM SIGIR conference on Research and development in information retrieval
A Fast Regular Expression Indexing Engine
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Processing SPARQL queries with regular expressions in RDF databases
DTMBIO '10 Proceedings of the ACM fourth international workshop on Data and text mining in biomedical informatics
x-RDF-3X: fast querying, high update rates, and consistency for RDF databases
Proceedings of the VLDB Endowment
DTMBIO 2011: international workshop on data and textmining in biomedical informatics
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
The Resource Description Framework (RDF) is widely used for sharing biomedical resources, such as the online protein database UniProt or gene database GeneOntology. SPARQL is the native query language for RDF databases and it features regular expressions in queries for which the exact values are either irrelevant or unknown. A recent paper by Lee et al. presented an efficient indexing support for such queries adopting multigram indexes for regular expressions. In this paper we contribute to their work by addressing index updates. As a result, we identify a major performance problem of straightforward implementations and design a new algorithm that utilizes unique properties of multigram indexes. Our contributions can be summarized as follows: 1) we propose an efficient update algorithm for regular expression indexes in RDF databases; 2) we build a prototype system for the proposed framework in C++; 3) we conduct extensive experiments to demonstrate the properties of our algorithm. The experiments show that our algorithm outperforms the straightforward implementations by an order of magnitude.