Automatic text processing
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A fully-dynamic data structure for external substring search
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Fast text searching for regular expressions or automaton searching on tries
Journal of the ACM (JACM)
Approximation algorithms for NP-hard problems
Automata for matching patterns
Handbook of formal languages, vol. 2
Bitmap index design and evaluation
SIGMOD '98 Proceedings of the 1998 ACM SIGMOD international conference on Management of data
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
q-gram based database searching using a suffix array (QUASAR)
RECOMB '99 Proceedings of the third annual international conference on Computational molecular biology
Efficient concurrency control in multidimensional access methods
SIGMOD '99 Proceedings of the 1999 ACM SIGMOD international conference on Management of data
ACM Transactions on Database Systems (TODS)
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
The budgeted maximum coverage problem
Information Processing Letters
Efficient string matching: an aid to bibliographic search
Communications of the ACM
A guided tour to approximate string matching
ACM Computing Surveys (CSUR)
Proceedings of the 2002 ACM SIGMOD international conference on Management of data
Transaction Processing: Concepts and Techniques
Transaction Processing: Concepts and Techniques
Introduction To Automata Theory, Languages, And Computation
Introduction To Automata Theory, Languages, And Computation
Computers and Intractability: A Guide to the Theory of NP-Completeness
Computers and Intractability: A Guide to the Theory of NP-Completeness
Approximate String Joins in a Database (Almost) for Free
Proceedings of the 27th International Conference on Very Large Data Bases
A Fast Regular Expression Indexing Engine
ICDE '02 Proceedings of the 18th International Conference on Data Engineering
Extending q-grams to estimate selectivity of string matching with low edit distance
VLDB '07 Proceedings of the 33rd international conference on Very large data bases
Proceedings of the 2008 ACM SIGMOD international conference on Management of data
A robust index for regular expression queries
Proceedings of the 20th ACM international conference on Information and knowledge management
Hi-index | 0.00 |
An important class of queries is the LIKE predicate in SQL. In the absence of an index, LIKE queries are subject to performance degradation. The notion of indexing on substrings (or q-grams) has been explored earlier without sufficient consideration of efficiency. q-grams are used to prune away rows that do not qualify for the query. The problem is to identify a finite number of grams subject to storage constraint that gives maximal pruning for a given query workload. Our contributions include: i) a formal problem definition, that produces results within a provable error bound, ii) performance evaluation of the application of the novel method to real data, and iii) parallelization of the algorithm, scaling considerations and a proposal to handle scaling issues.