Complete inverted files for efficient text retrieval and analysis
Journal of the ACM (JACM)
STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
A vector space model for automatic indexing
Communications of the ACM
Journal of Algorithms
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for document retrieval problems
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees
SIAM Journal on Computing
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Color Set Size Problem with Application to String Matching
CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms
SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)
Inverted indexes for phrases and strings
Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Fast and practical algorithms for computing all the runs in a string
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array
ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies
Hi-index | 0.00 |
We propose space-efficient data structures for text retrieval systems that have merits of both theoretical data structures like suffix trees and practical ones like inverted files. Traditional text retrieval systems use the inverted files and support ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents that contain given keywords, which cannot be solved by using only the suffix trees. A drawback of the systems is that the scores can be computed for only predetermined keywords. We extend the data structure so that the scores can be computed for any pattern efficiently while keeping the size of the data structures moderate. The size is comparable with the text size, which is an improvement from existing methods using O(n log n) bit space for a text collection of length n.