Fast algorithms for finding nearest common ancestors
SIAM Journal on Computing
On finding lowest common ancestors: simplification and parallelization
SIAM Journal on Computing
New indices for text: PAT Trees and PAT arrays
Information retrieval
Recursive star-tree parallel data structure
SIAM Journal on Computing
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
A generalized suffix tree and its (un)expected asymptotic behaviors
SIAM Journal on Computing
Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
Compact pat trees
A Space-Economical Suffix Tree Construction Algorithm
Journal of the ACM (JACM)
A unifying look at data structures
Communications of the ACM
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Database tuning: principles, experiments, and troubleshooting techniques
Database tuning: principles, experiments, and troubleshooting techniques
The Design and Analysis of Computer Algorithms
The Design and Analysis of Computer Algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Augmenting Suffix Trees, with Applications
ESA '98 Proceedings of the 6th Annual European Symposium on Algorithms
Optimal suffix tree construction with large alphabets
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Scaling and related techniques for geometry problems
STOC '84 Proceedings of the sixteenth annual ACM symposium on Theory of computing
Two-dimensional substring indexing
Journal of Computer and System Sciences - Special issu on PODS 2001
Automatic Information Organization and Retrieval.
Automatic Information Organization and Retrieval.
Sorting and Searching (Eatcs Monographs on Theoretical Computer Science)
Sorting and Searching (Eatcs Monographs on Theoretical Computer Science)
Rank/select operations on large alphabets: a tool for text indexing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data (Data-Centric Systems and Applications)
ACM Computing Surveys (CSUR)
Linear work suffix array construction
Journal of the ACM (JACM)
Succinct data structures for flexible text retrieval systems
Journal of Discrete Algorithms
Lower bounds for 2-dimensional range counting
Proceedings of the thirty-ninth annual ACM symposium on Theory of computing
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets
ACM Transactions on Algorithms (TALG)
Output-sensitive autocompletion search
Information Retrieval
Space-Efficient Algorithms for Document Retrieval
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Compressed Text Indexes with Fast Locate
CPM '07 Proceedings of the 18th annual symposium on Combinatorial Pattern Matching
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Monotone minimal perfect hashing: searching a sorted table with O(1) accesses
SODA '09 Proceedings of the twentieth Annual ACM-SIAM Symposium on Discrete Algorithms
Search Engines: Information Retrieval in Practice
Search Engines: Information Retrieval in Practice
Sourcerer: mining and searching internet-scale software repositories
Data Mining and Knowledge Discovery
Fast error-tolerant search on very large texts
Proceedings of the 2009 ACM symposium on Applied Computing
An experimental investigation of set intersection algorithms for text searching
Journal of Experimental Algorithmics (JEA)
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing
WADS '09 Proceedings of the 11th International Symposium on Algorithms and Data Structures
Range Quantile Queries: Another Virtue of Wavelet Trees
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Space-Efficient Framework for Top-k String Retrieval Problems
FOCS '09 Proceedings of the 2009 50th Annual IEEE Symposium on Foundations of Computer Science
Succinct dynamic dictionaries and trees
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Mining Query Logs: Turning Search Usage Data into Knowledge
Foundations and Trends in Information Retrieval
Modern Information Retrieval
Note: Fast set intersection and two-patterns matching
Theoretical Computer Science
Information Retrieval: Implementing and Evaluating Search Engines
Information Retrieval: Implementing and Evaluating Search Engines
Fully-functional succinct trees
SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Efficient index for retrieving top-k most frequent documents
Journal of Discrete Algorithms
Cell probe lower bounds and approximations for range mode
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Optimal trade-offs for succinct string indexes
ICALP'10 Proceedings of the 37th international colloquium conference on Automata, languages and programming
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Theoretical Computer Science
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Alphabet-independent compressed text indexing
ESA'11 Proceedings of the 19th European conference on Algorithms
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Top-K color queries for document retrieval
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Range selection and median: tight cell probe lower bounds and adaptive data structures
Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
LATIN'12 Proceedings of the 10th Latin American international conference on Theoretical Informatics
Efficient in-memory top-k document retrieval
SIGIR '12 Proceedings of the 35th international ACM SIGIR conference on Research and development in information retrieval
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Document listing for queries with excluded pattern
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
New lower and upper bounds for representing sequences
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
Colored range queries and document retrieval
Theoretical Computer Science
Space-efficient data-analysis queries on grids
Theoretical Computer Science
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Faster Compressed Top-k Document Retrieval
DCC '13 Proceedings of the 2013 Data Compression Conference
Faster Compact Top-k Document Retrieval
DCC '13 Proceedings of the 2013 Data Compression Conference
Hi-index | 0.00 |
Document retrieval is one of the best-established information retrieval activities since the ’60s, pervading all search engines. Its aim is to obtain, from a collection of text documents, those most relevant to a pattern query. Current technology is mostly oriented to “natural language” text collections, where inverted indexes are the preferred solution. As successful as this paradigm has been, it fails to properly handle various East Asian languages and other scenarios where the “natural language” assumptions do not hold. Inthis survey, we cover the recent research in extending the document retrieval techniques to a broader class of sequence collections, which has applications in bioinformatics, data and web mining, chemoinformatics, software engineering, multimedia information retrieval, and many other fields. We focus on the algorithmic aspects of the techniques, uncovering a rich world of relations between document retrieval challenges and fundamental problems on trees, strings, range queries, discrete geometry, and other areas.