Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Compact pat trees
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
Flexible pattern matching in strings: practical on-line search algorithms for texts and biological sequences
COCOON '96 Proceedings of the Second Annual International Conference on Computing and Combinatorics
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
String Matching with Stopper Encoding and Code Splitting
CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Replacing suffix trees with enhanced suffix arrays
Journal of Discrete Algorithms - SPIRE 2002
ACM Computing Surveys (CSUR)
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Practical and optimal string matching
SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Privacy-enhanced string matching with wordwise positional sampling
Proceedings of the 8th International Conference on Ubiquitous Information Management and Communication
Hi-index | 0.01 |
We introduce a novel alphabet sampling technique for speeding up both online and indexed string matching. We choose a subset of the alphabet and extract the corresponding subsequence of the text. Online or indexed searching is then carried out on the extracted subsequence, and candidate matches are verified in the full text. We show that this speeds up online searching, especially for moderate to long patterns, by a factor of up to 5, while using 14% extra space in our experiments. For indexed searching we achieve indexes that are as fast as the classical suffix array, yet occupy less than 50% extra space (instead of the usual 400%). Our experiments show no competitive alternatives exist in a wide space/time range.