Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Filtered document retrieval with frequency-sorted indexes
Journal of the American Society for Information Science
A language modeling approach to information retrieval
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
A vector space model for automatic indexing
Communications of the ACM
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Inverted files for text search engines
ACM Computing Surveys (CSUR)
Pruned query evaluation using pre-computed impacts
SIGIR '06 Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval
ACM Computing Surveys (CSUR)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Space-Efficient Algorithms for Document Retrieval
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Inverted files versus suffix arrays for locating patterns in primary memory
SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Efficient top-k queries for orthogonal ranges
WALCOM'11 Proceedings of the 5th international conference on WALCOM: algorithms and computation
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Computing the longest common prefix array based on the burrows-wheeler transform
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Improved compressed indexes for full-text document retrieval
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Space-efficient data-analysis queries on grids
ISAAC'11 Proceedings of the 22nd international conference on Algorithms and Computation
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
FEMTO: fast search of large sequence collections
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Computing the burrows-wheeler transform of a string and its reverse
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Space-Efficient computation of maximal and supermaximal repeats in genome sequences
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Dual-Sorted inverted lists in practice
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
Computing the longest common prefix array based on the Burrows-Wheeler transform
Journal of Discrete Algorithms
Colored range queries and document retrieval
Theoretical Computer Science
Space-efficient data-analysis queries on grids
Theoretical Computer Science
Trends in suffix sorting: a survey of low memory algorithms
ACSC '12 Proceedings of the Thirty-fifth Australasian Computer Science Conference - Volume 122
Spaces, Trees, and Colors: The algorithmic landscape of document retrieval on sequences
ACM Computing Surveys (CSUR)
Indexing Word Sequences for Ranked Retrieval
ACM Transactions on Information Systems (TOIS)
Journal of Discrete Algorithms
Computing the Burrows-Wheeler transform of a string and its reverse in parallel
Journal of Discrete Algorithms
Hi-index | 0.00 |
Text search engines return a set of k documents ranked by similarity to a query. Typically, documents and queries are drawn from natural language text, which can readily be partitioned into words, allowing optimizations of data structures and algorithms for ranking. However, in many new search domains (DNA, multimedia, OCR texts, Far East languages) there is often no obvious definition of words and traditional indexing approaches are not so easily adapted, or break down entirely. We present two new algorithms for ranking documents against a query without making any assumptions on the structure of the underlying text. We build on existing theoretical techniques, which we have implemented and compared empirically with new approaches introduced in this paper. Our best approach is significantly faster than existing methods in RAM, and is even three times faster than a state-of-the-art inverted file implementation for English text when word queries are issued.