Compressed Text Databases with Efficient Query Algorithms Based on the Compressed Suffix Array

Authors:
Kunihiko Sadakane
Affiliations:
-
Venue:
ISAAC '00 Proceedings of the 11th International Conference on Algorithms and Computation
Year:
2000

Citing 13
Cited 24

A locally adaptive data compression scheme

Communications of the ACM
Suffix arrays: a new method for on-line string searches

SIAM Journal on Computing
String matching in Lempel-Ziv compressed strings

STOC '95 Proceedings of the twenty-seventh annual ACM symposium on Theory of computing
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
Compressed suffix arrays and suffix trees with applications to text indexing and string matching (extended abstract)

STOC '00 Proceedings of the thirty-second annual ACM symposium on Theory of computing
Information Retrieval: Algorithms and Heuristics

Information Retrieval: Algorithms and Heuristics
Tables

Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
A Cooperative Distributed Text Database Management Method Unifying Search and Compression Based on the Burrows-Wheeler Transformation

ER '98 Proceedings of the Workshops on Data Warehousing and Data Mining: Advances in Database Technologies
Text Retrieval by Using k-word Proximity Search

DANTE '99 Proceedings of the 1999 International Symposium on Database Applications in Non-Traditional Environments
A Modified Burrows-Wheeler Transformation for Case-Insensitive Search with Application to Suffix Array Compression

DCC '99 Proceedings of the Conference on Data Compression
A Unifying Framework for Compressed Pattern Matching

SPIRE '99 Proceedings of the String Processing and Information Retrieval Symposium & International Workshop on Groupware
Opportunistic data structures with applications

Opportunistic data structures with applications
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science

Space-Efficient Data Structures for Flexible Text Retrieval Systems

ISAAC '02 Proceedings of the 13th International Symposium on Algorithms and Computation
Indexing Text Using the Ziv-Lempel Trie

SPIRE 2002 Proceedings of the 9th International Symposium on String Processing and Information Retrieval
A Space and Time Efficient Algorithm for Constructing Compressed Suffix Arrays

COCOON '02 Proceedings of the 8th Annual International Conference on Computing and Combinatorics
Linear-Time Longest-Common-Prefix Computation in Suffix Arrays and Its Applications

CPM '01 Proceedings of the 12th Annual Symposium on Combinatorial Pattern Matching
Space-Economical Algorithms for Finding Maximal Unique Matches

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
The Minimum DAWG for All Suffixes of a String and Its Applications

CPM '02 Proceedings of the 13th Annual Symposium on Combinatorial Pattern Matching
Engineering a Lightweight Suffix Array Construction Algorithm

ESA '02 Proceedings of the 10th Annual European Symposium on Algorithms
HOT SAX: Efficiently Finding the Most Unusual Time Series Subsequence

ICDM '05 Proceedings of the Fifth IEEE International Conference on Data Mining
Succinct dictionary matching with no slowdown

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Compression, indexing, and retrieval for massive string data

CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices

Journal of Experimental Algorithmics (JEA)
A mining technique using n-grams and motion transcripts for body sensor network data repository

WH '10 Wireless Health 2010
Alphabet-independent compressed text indexing

ESA'11 Proceedings of the 19th European conference on Algorithms
Scalable detection of frequent substrings by grammar-based compression

DS'11 Proceedings of the 14th international conference on Discovery science
Inverted files versus suffix arrays for locating patterns in primary memory

SPIRE'06 Proceedings of the 13th international conference on String Processing and Information Retrieval
Succinct text indexes on large alphabet

TAMC'06 Proceedings of the Third international conference on Theory and Applications of Models of Computation
Space-efficient construction of LZ-index

ISAAC'05 Proceedings of the 16th international conference on Algorithms and Computation
Succinct suffix arrays based on run-length encoding

CPM'05 Proceedings of the 16th annual conference on Combinatorial Pattern Matching
Advantages of backward searching — efficient secondary memory and distributed implementation of compressed suffix arrays

ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Counting suffix arrays and strings

SPIRE'05 Proceedings of the 12th international conference on String Processing and Information Retrieval
Compact Suffix Array — A Space-Efficient Full-Text Index

Fundamenta Informaticae - Computing Patterns in Strings
Compressed data structures with relevance

Proceedings of the 21st ACM international conference on Information and knowledge management
ESP-index: A compressed index based on edit-sensitive parsing

Journal of Discrete Algorithms
A Compressed Suffix Tree Based Implementation With Low Peak Memory Usage

Electronic Notes in Theoretical Computer Science (ENTCS)

Quantified Score

Hi-index	0.00

Visualization

Abstract

A compressed text database based on the compressed suffix array is proposed. The compressed suffix array of Grossi and Vitter occupies only O(n) bits for a text of length n; however it also uses the text itself that occupies O(n log |Σ|) bits for the alphabet Σ. On the other hand, our data structure does not use the text itself, and supports important operations for text databases: inverse, search and decompress. Our algorithms can find occ occurrences of any substring P of the text in O(|P| log n + occ logƐ n) time and decompress a part of the text of length l in O(l + logƐ n) time for any given 1 ≥ Ɛ 0. Our data structure occupies only n(2/Ɛ (3/2 + H0 + 2 log H0) + 2 + 4 logƐ n/logƐ n-1)+o(n)+O(|Σ| log |Σ|) bits where H0 ≤ log |Σ| is the order-0 entropy of the text. We also show the relationship with the opportunistic data structure of Ferragina and Manzini.