New indices for text: PAT Trees and PAT arrays
Information retrieval
Suffix arrays: a new method for on-line string searches
SIAM Journal on Computing
Text algorithms
Compact pat trees
An analysis of the Burrows—Wheeler transform
Journal of the ACM (JACM)
An experimental study of a compressed index
Information Sciences: an International Journal - Dictionary based compression
Succinct representations of lcp information and improvements in the compressed suffix arrays
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
Proceedings of the 16th Conference on Foundations of Software Technology and Theoretical Computer Science
Succinct representation of balanced parentheses, static trees and planar graphs
FOCS '97 Proceedings of the 38th Annual Symposium on Foundations of Computer Science
Compressed Index for Dynamic Text
DCC '04 Proceedings of the Conference on Data Compression
Indexing text using the Ziv-Lempel trie
Journal of Discrete Algorithms - SPIRE 2002
A linear lower bound on index size for text retrieval
Journal of Algorithms - Special issue: Twelfth annual ACM-SIAM symposium on discrete algorithms
New text indexing functionalities of the compressed suffix arrays
Journal of Algorithms
Journal of the ACM (JACM)
Boosting textual compression in optimal linear time
Journal of the ACM (JACM)
Compressed Suffix Arrays and Suffix Trees with Applications to Text Indexing and String Matching
SIAM Journal on Computing
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Rank/select operations on large alphabets: a tool for text indexing
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Squeezing succinct data structures into entropy bounds
SODA '06 Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm
Succinct suffix arrays based on run-length encoding
Nordic Journal of Computing
When indexing equals compression: Experiments with compressing suffix arrays and applications
ACM Transactions on Algorithms (TALG)
Succinct representations of permutations
ICALP'03 Proceedings of the 30th international conference on Automata, languages and programming
Statistical encoding of succinct data structures
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
ISAAC'04 Proceedings of the 15th international conference on Algorithms and Computation
Large alphabets and incompressibility
Information Processing Letters
ACM Computing Surveys (CSUR)
Fast BWT in small space by blockwise suffix sorting
Theoretical Computer Science
Rank and select revisited and extended
Theoretical Computer Science
Dynamic entropy-compressed sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
A compressed self-index using a Ziv---Lempel dictionary
Information Retrieval
Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval
Algorithms and data structures for external memory
Foundations and Trends® in Theoretical Computer Science
An(other) Entropy-Bounded Compressed Suffix Tree
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
Dynamic Fully-Compressed Suffix Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
An Improved Succinct Representation for Dynamic k-ary Trees
CPM '08 Proceedings of the 19th annual symposium on Combinatorial Pattern Matching
On the Redundancy of Succinct Data Structures
SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Fast and Adaptive Variable Order Markov Chain Construction
WABI '08 Proceedings of the 8th international workshop on Algorithms in Bioinformatics
Succinct backward-DAWG-matching
Journal of Experimental Algorithmics (JEA)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Self-indexing Natural Language
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Run-Length Compressed Indexes Are Superior for Highly Repetitive Sequence Collections
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Practical Rank/Select Queries over Arbitrary Sequences
SPIRE '08 Proceedings of the 15th International Symposium on String Processing and Information Retrieval
Storage and Retrieval of Individual Genomes
RECOMB 2'09 Proceedings of the 13th Annual International Conference on Research in Computational Molecular Biology
On the Value of Multiple Read/Write Streams for Data Compression
CPM '09 Proceedings of the 20th Annual Symposium on Combinatorial Pattern Matching
Efficient Data Structures for the Orthogonal Range Successor Problem
COCOON '09 Proceedings of the 15th Annual International Conference on Computing and Combinatorics
Succinct Orthogonal Range Search Structures on a Grid with Applications to Text Indexing
WADS '09 Proceedings of the 11th International Symposium on Algorithms and Data Structures
A four-stage algorithm for updating a Burrows-Wheeler transform
Theoretical Computer Science
Dynamic rank/select structures with applications to run-length encoded texts
Theoretical Computer Science
Rank/select on dynamic compressed sequences and applications
Theoretical Computer Science
Compressing and indexing labeled trees, with applications
Journal of the ACM (JACM)
On Entropy-Compressed Text Indexing in External Memory
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
Faster entropy-bounded compressed suffix trees
Theoretical Computer Science
SOFSEM '10 Proceedings of the 36th Conference on Current Trends in Theory and Practice of Computer Science
Dynamic extended suffix arrays
Journal of Discrete Algorithms
Implicit compression boosting with applications to self-indexing
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Improved dynamic rank-select entropy-bound structures
LATIN'08 Proceedings of the 8th Latin American conference on Theoretical informatics
Fast and Compact Web Graph Representations
ACM Transactions on the Web (TWEB)
The compressed permuterm index
ACM Transactions on Algorithms (TALG)
A web search engine model based on index-query bit-level compression
Proceedings of the 1st International Conference on Intelligent Semantic Web-Services and Applications
Approximate all-pairs suffix/prefix overlaps
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Compression, indexing, and retrieval for massive string data
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Parallel and distributed compressed indexes
CPM'10 Proceedings of the 21st annual conference on Combinatorial pattern matching
Engineering basic algorithms of an in-memory text search engine
ACM Transactions on Information Systems (TOIS)
Data structures: time, I/Os, entropy, joules!
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Top-k ranked document search in general text databases
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
Practical approaches to reduce the space requirement of lempel-ziv--based compressed text indices
Journal of Experimental Algorithmics (JEA)
Medium-space algorithms for inverse BWT
ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part I
Improved data structures for the orthogonal range successor problem
Computational Geometry: Theory and Applications
Compressed self-indices supporting conjunctive queries on document collections
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
String retrieval for multi-pattern queries
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Colored range queries and document retrieval
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Faster compressed dictionary matching
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Succinct representations of dynamic strings
SPIRE'10 Proceedings of the 17th international conference on String processing and information retrieval
Space-efficient construction of Lempel-Ziv compressed text indexes
Information and Computation
A quick tour on suffix arrays and compressed suffix arrays
Theoretical Computer Science
Space-efficient substring occurrence estimation
Proceedings of the thirtieth ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems
ACM Transactions on Algorithms (TALG)
Compressed string dictionaries
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Practical compressed document retrieval
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Succinct dynamic cardinal trees with constant time operations for small alphabet
TAMC'11 Proceedings of the 8th annual conference on Theory and applications of models of computation
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Counting colours in compressed strings
CPM'11 Proceedings of the 22nd annual conference on Combinatorial pattern matching
Space efficient data structures for dynamic orthogonal range counting
WADS'11 Proceedings of the 12th international conference on Algorithms and data structures
Alphabet-independent compressed text indexing
ESA'11 Proceedings of the 19th European conference on Algorithms
Fixed block compression boosting in FM-indexes
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Space efficient wavelet tree construction
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Compressed text indexing with wildcards
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Finding frequent elements in compressed 2D arrays and strings
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Improved compressed indexes for full-text document retrieval
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Compressed indexes for aligned pattern matching
SPIRE'11 Proceedings of the 18th international conference on String processing and information retrieval
Practical representations for web and social graphs
Proceedings of the 20th ACM international conference on Information and knowledge management
Space-Efficient Preprocessing Schemes for Range Minimum Queries on Static Arrays
SIAM Journal on Computing
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Top-k document retrieval in optimal time and linear space
Proceedings of the twenty-third annual ACM-SIAM symposium on Discrete Algorithms
Dynamic entropy-compressed sequences and full-text indexes
CPM'06 Proceedings of the 17th Annual conference on Combinatorial Pattern Matching
Journal of Discrete Algorithms
Optimal succinctness for range minimum queries
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Compact rich-functional binary relation representations
LATIN'10 Proceedings of the 9th Latin American conference on Theoretical Informatics
Practical compressed suffix trees
SEA'10 Proceedings of the 9th international conference on Experimental Algorithms
On the number of elements to reorder when updating a suffix array
Journal of Discrete Algorithms
New algorithms on wavelet trees and applications to information retrieval
Theoretical Computer Science
Approximate all-pairs suffix/prefix overlaps
Information and Computation
Unified view of backward backtracking in short read mapping
Algorithms and Applications
Memory-Aware BWT by segmenting sequences to support subsequence search
APWeb'12 Proceedings of the 14th Asia-Pacific international conference on Web Technologies and Applications
CRAM: compressed random access memory
ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
Self-Indexed Grammar-Based Compression
Fundamenta Informaticae
Space-Efficient top-k document retrieval
SEA'12 Proceedings of the 11th international conference on Experimental Algorithms
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Towards an optimal space-and-query-time index for top-k document retrieval
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Least random suffix/prefix matches in output-sensitive time
CPM'12 Proceedings of the 23rd Annual conference on Combinatorial Pattern Matching
Dynamic rank-select structures with applications to run-length encoded texts
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Space-efficient algorithms for document retrieval
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
Compressed text indexes with fast locate
CPM'07 Proceedings of the 18th annual conference on Combinatorial Pattern Matching
A framework for dynamizing succinct data structures
ICALP'07 Proceedings of the 34th international conference on Automata, Languages and Programming
New lower and upper bounds for representing sequences
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
Succinct data structures for path queries
ESA'12 Proceedings of the 20th Annual European conference on Algorithms
WABI'12 Proceedings of the 12th international conference on Algorithms in Bioinformatics
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Smaller self-indexes for natural language
SPIRE'12 Proceedings of the 19th international conference on String Processing and Information Retrieval
Implicit indexing of natural language text by reorganizing bytecodes
Information Retrieval
Improved compressed indexes for full-text document retrieval
Journal of Discrete Algorithms
Development of a Novel Compressed Index-Query Web Search Engine Model
International Journal of Information Technology and Web Engineering
Faster compressed dictionary matching
Theoretical Computer Science
Compressed text indexing with wildcards
Journal of Discrete Algorithms
Cache-aware parallel approximate matching and join algorithms using BWT
Proceedings of the Joint EDBT/ICDT 2013 Workshops
On compressing and indexing repetitive sequences
Theoretical Computer Science
Colored range queries and document retrieval
Theoretical Computer Science
Compressing IP forwarding tables: towards entropy bounds and beyond
Proceedings of the ACM SIGCOMM 2013 conference on SIGCOMM
Better space bounds for parameterized range majority and minority
WADS'13 Proceedings of the 13th international conference on Algorithms and Data Structures
Space efficient data structures for dynamic orthogonal range counting
Computational Geometry: Theory and Applications
On compressing permutations and adaptive sorting
Theoretical Computer Science
Compact binary relation representations with rich functionality
Information and Computation
Journal of Discrete Algorithms
Hi-index | 0.01 |
Given a sequence S = s1s2…sn of integers smaller than r = O(polylog(n)), we show how S can be represented using nH0(S) + o(n) bits, so that we can know any sq, as well as answer rank and select queries on S, in constant time. H0(S) is the zero-order empirical entropy of S and nH0(S) provides an information-theoretic lower bound to the bit storage of any sequence S via a fixed encoding of its symbols. This extends previous results on binary sequences, and improves previous results on general sequences where those queries are answered in O(log r) time. For larger r, we can still represent S in nH0(S) + o(n log r) bits and answer queries in O(log r/log log n) time. Another contribution of this article is to show how to combine our compressed representation of integer sequences with a compression boosting technique to design compressed full-text indexes that scale well with the size of the input alphabet Σ. Specifically, we design a variant of the FM-index that indexes a string T[1, n] within nHk(T) + o(n) bits of storage, where Hk(T) is the kth-order empirical entropy of T. This space bound holds simultaneously for all k ≤ α log|Σ| n, constant 0 O(polylog(n)). This index counts the occurrences of an arbitrary pattern P[1, p] as a substring of T in O(p) time; it locates each pattern occurrence in O(log1+ϵ n) time for any constant 0 O(ℓ + log1+ϵ n) time. Compared to all previous works, our index is the first that removes the alphabet-size dependance from all query times, in particular, counting time is linear in the pattern length. Still, our index uses essentially the same space of the kth-order entropy of the text T, which is the best space obtained in previous work. We can also handle larger alphabets of size |Σ| = O(nβ), for any 0 o(n log|Σ|) extra space and multiplying all query times by O(log |Σ|/log log n).