From regular expressions to deterministic automata
Theoretical Computer Science
Query evaluation techniques for large databases
ACM Computing Surveys (CSUR)
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Derivatives of Regular Expressions
Journal of the ACM (JACM)
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Rethinking Database System Architecture: Towards a Self-Tuning RISC-Style Database System
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema
ISWC '02 Proceedings of the First International Semantic Web Conference on The Semantic Web
A Fast Regular Expression Indexing Engine
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
An efficient SQL-based RDF querying scheme
VLDB '05 Proceedings of the 31st international conference on Very large data bases
Scalable semantic web data management using vertical partitioning
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
SPARQLeR: Extended Sparql for Semantic Association Discovery
ESWC '07 Proceedings of the 4th European conference on The Semantic Web: Research and Applications
RDF-3X: a RISC-style engine for RDF
Proceedings of the VLDB Endowment
Extending SPARQL with regular expression patterns (for querying RDF)
Web Semantics: Science, Services and Agents on the World Wide Web
NAGA: Searching and Ranking Knowledge
ICDE '08 Proceedings of the 2008 IEEE 24th International Conference on Data Engineering
On supporting efficient updates of regular expression indexes in RDF databases
Proceedings of the ACM fifth international workshop on Data and text mining in biomedical informatics
Hi-index | 0.00 |
As the Resource Description Framework (RDF) data model is widely used for modeling and sharing a lot of online bioinformatics resources such as Uniprot (dev.isb-sib.ch/projects/uniprot-rdf) or Bio2RDF (bio2rdf.org), SPARQL -- a W3C recommendation query for RDF databases - has become an important query language for querying the bioinformatics knowledge bases. Moreover, due to the diversity of users' requests for extracting information from the RDF data as well as the lack of users' knowledge about the exact value of each fact in the RDF databases, it is desirable to use the SPARQL query with regular expression patterns for querying the RDF data. To the best of our knowledge, there is currently no work that efficiently supports regular expression processing in SPARQL over RDF databases. Most of the existing techniques for processing regular expressions are designed for querying a text corpus, or only for supporting the matching over the paths in an RDF graph. In this paper, we propose a novel framework for supporting regular expression processing in SPARQL query. Our contributions can be summarized as follows. 1) We propose an efficient framework for processing SPARQL queries with regular expression patterns in RDF databases. 2) We propose a cost model in order to adapt the proposed framework in the existing query optimizers. 3) We build a prototype for the proposed framework in C++ and conduct extensive experiments demonstrating the efficiency and effectiveness of our technique. Experiments with a full-blown RDF engine show that our framework outperforms the existing ones by up to two orders of magnitude in processing SPARQL queries with regular expression patterns.