Inferring decision trees using the minimum description length principle
Information and Computation
The R*-tree: an efficient and robust access method for points and rectangles
SIGMOD '90 Proceedings of the 1990 ACM SIGMOD international conference on Management of data
Fast text searching: allowing errors
Communications of the ACM
Evaluation of signature files as set access facilities in OODBs
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Storing semistructured data with STORED
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
Counting and random generation of strings in regular languages
Proceedings of the sixth annual ACM-SIAM symposium on Discrete algorithms
XTRACT: a system for extracting document type descriptors from XML documents
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
BGP4: Inter-Domain Routing in the Internet
BGP4: Inter-Domain Routing in the Internet
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
R-trees: a dynamic index structure for spatial searching
SIGMOD '84 Proceedings of the 1984 ACM SIGMOD international conference on Management of data
On Efficient Matching of Streaming XML Documents and Queries
EDBT '02 Proceedings of the 8th International Conference on Extending Database Technology: Advances in Database Technology
Efficient Filtering of XML Documents for Selective Dissemination of Information
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
MDL learning of unions of simple pattern languages from positive examples
EuroCOLT '95 Proceedings of the Second European Conference on Computational Learning Theory
YFilter: Efficient and Scalable Filtering of XML Documents
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Efficient Filtering of XML Documents with XPath Expressions
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
Introduction to Automata Theory, Languages, and Computation (3rd Edition)
RE-Tree: an efficient index structure for regular expressions
VLDB '02 Proceedings of the 28th international conference on Very Large Data Bases
Query optimization in XML structured-document databases
The VLDB Journal — The International Journal on Very Large Data Bases
Multilingual phrase-based concordance generation in real-time
Information Retrieval
Scalable regular expression matching on data streams
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
Intelligently creating and recommending reusable reformatting rules
Proceedings of the 14th international conference on Intelligent user interfaces
Proceedings of the VLDB Endowment
Efficiently evaluating complex boolean expressions
Proceedings of the 2010 ACM SIGMOD International Conference on Management of data
Sharing, finding and reusing end-user code for reformatting and validating data
Journal of Visual Languages and Computing
Best fitting fixed-length substring patterns for a set of strings
COCOON'05 Proceedings of the 11th annual international conference on Computing and Combinatorics
Measuring over-generalization in the minimal multiple generalizations of biosequences
DS'05 Proceedings of the 8th international conference on Discovery Science
ICGI'06 Proceedings of the 8th international conference on Grammatical Inference: algorithms and applications
Regular path queries on large graphs
SSDBM'12 Proceedings of the 24th international conference on Scientific and Statistical Database Management
Efficient parsing-based search over structured data
Proceedings of the 22nd ACM international conference on Conference on information & knowledge management
Stochastically Balancing Trees for File and Database Systems
International Journal of Green Computing
Efficient subsequence search in databases
WAIM'13 Proceedings of the 14th international conference on Web-Age Information Management
Hi-index | 0.00 |
Abstract.Due to their expressive power, regular expressions (REs) are quickly becoming an integral part of language specifications for several important application scenarios. Many of these applications have to manage huge databases of RE specifications and need to provide an effective matching mechanism that, given an input string, quickly identifies the REs in the database that match it. In this paper, we propose the RE-tree, a novel index structure for large databases of RE specifications. Given an input query string, the RE-tree speeds up the retrieval of matching REs by focusing the search and comparing the input string with only a small fraction of REs in the database. Even though the RE-tree is similar in spirit to other tree-based structures that have been proposed for indexing multidimensional data, RE indexing is significantly more challenging since REs typically represent infinite sets of strings with no well-defined notion of spatial locality. To address these new challenges, our RE-tree index structure relies on novel measures for comparing the relative sizes of infinite regular languages. We also propose innovative solutions for the various RE-tree operations including the effective splitting of RE-tree nodes and computing a "tight" bounding RE for a collection of REs. Finally, we demonstrate how sampling-based approximation algorithms can be used to significantly speed up the performance of RE-tree operations. Preliminary experimental results with moderately large synthetic data sets indicate that the RE-tree is effective in pruning the search space and easily outperforms naive sequential search approaches.