Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Overview of the second text retrieval conference (TREC-2)
TREC-2 Proceedings of the second conference on Text retrieval conference
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Reducing the space requirement of suffix trees
Software—Practice & Experience
Fast and flexible word searching on compressed text
ACM Transactions on Information Systems (TOIS)
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Adding Compression to Block Addressing Inverted Indexes
Information Retrieval
Indexing and Retrieval for Genomic Databases
IEEE Transactions on Knowledge and Data Engineering
Searching Large Lexicons for Partially Specified Terms using Compressed Inverted Files
VLDB '93 Proceedings of the 19th International Conference on Very Large Data Bases
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Optimal Exact Strring Matching Based on Suffix Arrays
SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
When indexing equals compression: experiments with compressing suffix arrays and applications
SODA '04 Proceedings of the fifteenth annual ACM-SIAM symposium on Discrete algorithms
Inverted Index Compression Using Word-Aligned Binary Codes
Information Retrieval
Improved Gapped Alignment in BLAST
IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
GLIMPSE: a tool to search through entire file systems
WTEC'94 Proceedings of the USENIX Winter 1994 Technical Conference on USENIX Winter 1994 Technical Conference
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Output-sensitive autocompletion search
Information Retrieval
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
Structural optimization of a full-text n-gram index using relational normalization
The VLDB Journal — The International Journal on Very Large Data Bases
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Out of the Box Phrase Indexing
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
String matching with alphabet sampling
Journal of Discrete Algorithms
Space-efficient algorithms for document retrieval
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Hi-index | 0.00 |
Recent advances in the asymptotic resource costs of pattern matching with compressed suffix arrays are attractive, but a key rival structure, the compressed inverted file, has been dismissed or ignored in papers presenting the new structures. In this paper we examine the resource requirements of compressed suffix array algorithms against compressed inverted file data structures for general pattern matching in genomic and English texts. In both cases, the inverted file indexes q-grams, thus allowing full pattern matching capabilities, rather than simple word based search, making their functionality equivalent to the compressed suffix array structures. When using equivalent memory for the two structures, inverted files are faster at reporting the location of patterns when the number of occurrences of the patterns is high.