Space-Efficient Data Structures for Flexible Text Retrieval Systems

Authors:
Kunihiko Sadakane
Affiliations:
-
Venue:
ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Year:
2002

Citing 14
Cited 3

Complete inverted files for efficient text retrieval and analysis

Journal of the ACM (JACM)
Sorting in linear time?

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
A vector space model for automatic indexing

Communications of the ACM
Space efficient suffix trees

Journal of Algorithms
Succinct representations of lcp information and improvements in the compressed suffix arrays

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Efficient algorithms for document retrieval problems

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct Representation of Balanced Parentheses and Static Trees

SIAM Journal on Computing
Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
The LCA Problem Revisited

LATIN '00 Proceedings of the 4th Latin American Symposium on Theoretical Informatics
Color Set Size Problem with Application to String Matching

CPM '92 Proceedings of the Third Annual Symposium on Combinatorial Pattern Matching
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Linear pattern matching algorithms

SWAT '73 Proceedings of the 14th Annual Symposium on Switching and Automata Theory (swat 1973)

Inverted indexes for phrases and strings

Proceedings of the 34th international ACM SIGIR conference on Research and development in Information Retrieval
Fast and practical algorithms for computing all the runs in a string

CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We propose space-efficient data structures for text retrieval systems that have merits of both theoretical data structures like suffix trees and practical ones like inverted files. Traditional text retrieval systems use the inverted files and support ranking queries based on the tf*idf (term frequency times inverse document frequency) scores of documents that contain given keywords, which cannot be solved by using only the suffix trees. A drawback of the systems is that the scores can be computed for only predetermined keywords. We extend the data structure so that the scores can be computed for any pattern efficiently while keeping the size of the data structures moderate. The size is comparable with the text size, which is an improvement from existing methods using O(n log n) bit space for a text collection of length n.