Inferring decision trees using the minimum description length principle
Information and Computation
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast text searching: allowing errors
Communications of the ACM
Evaluation of signature files as set access facilities in OODBs
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Counting and random generation of strings in regular languages
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
BGP4: Inter-Domain Routing in the Internet
BGP4: Inter-Domain Routing in the Internet
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
MDL learning of unions of simple pattern languages from positive examples
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
YFilter: Efficient and Scalable Filtering of XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
RE-tree: an efficient index structure for regular expressions
The VLDB Journal — The International Journal on Very Large Data Bases
Summary-based routing for content-based event distribution networks
ACM SIGCOMM Computer Communication Review
Clustering and indexing of experience sequences for popularity-driven recommendations
Proceedings of the 3rd ACM workshop on Continuous archival and retrival of personal experences
No Code Required: Giving Users Tools to Transform the Web
No Code Required: Giving Users Tools to Transform the Web
SPiDeR: P2P-based web service discovery
ICSOC'05 Proceedings of the Third international conference on Service-Oriented Computing
Extending XML with nonmonotonic multiple inheritance
DASFAA'05 Proceedings of the 10th international conference on Database Systems for Advanced Applications
Hi-index | 0.00 |
Due to their expressive power, Regular Expressions (REs) are quickly becoming an integral part of language specifications for several important application scenarios. Many of these applications have to manage huge databases of RE specifications and need to provide an effective matching mechanism that, given an input string, quickly identifies the REs in the database that match it. In this paper, we propose the RE-tree, a novel index structure for large databases of RE specifications. Given an input query string, the RE-tree speeds up the retrieval of matching REs by focusing the search and comparing the input string with only a small fraction of REs in the database. Even though the RE-tree is similar in spirit to other tree-based structures that have been proposed for indexing multi-dimensional data, RE indexing is significantly more challenging since REs typically represent infinite sets of strings with no well-defined notion of spatial locality. To address these new challenges, our RE-tree index structure relies on novel measures for comparing the relative sizes of infinite regular languages. We also propose innovative solutions for the various RE-tree operations, including the effective splitting of RE-tree nodes and computing a "tight" bounding RE for a collection of REs. Finally, we demonstrate how sampling-based approximation algorithms can be used to significantly speed up the performance of RE-tree operations. Our experimental results with synthetic data sets indicate that the REtree is very effective in pruning the search space and easily outperforms naive sequential search approaches.