Compressed string dictionaries

Authors:
Nieves R. Brisaboa;Rodrigo Cánovas;Francisco Claude;Miguel A. Martínez-Prieto;Gonzalo Navarro
Affiliations:
Database Lab, Universidade da Coruña, Spain;Department of Computer Science, University of Chile, Chile;School of Computer Science, University of Waterloo, Canada;Department of Computer Science, University of Chile, Chile and Department of Computer Science, Universidad de Valladolid, Spain;Department of Computer Science, University of Chile, Chile
Venue:
SEA'11 Proceedings of the 10th international conference on Experimental algorithms
Year:
2011

Citing 25
Cited 9

Algorithms on strings, trees, and sequences: computer science and computational biology

Algorithms on strings, trees, and sequences: computer science and computational biology
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Foundations of statistical natural language processing

Foundations of statistical natural language processing
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Graph structure in the Web

Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Information Retrieval: Computational and Theoretical Aspects

Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets

SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval

Modern Information Retrieval
Introduction to Algorithms

Introduction to Algorithms
High-order entropy-compressed text indexes

SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
In-Place Calculation of Minimum-Redundancy Codes

WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
Opportunistic data structures with applications

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compressing the Graph Structure of the Web

DCC '01 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques

Proceedings of the 13th international conference on World Wide Web
Structuring labeled trees for optimal succinctness, and beyond

FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
Compressed full-text indexes

ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes

ACM Transactions on Algorithms (TALG)
Compressed text indexes: From theory to practice

Journal of Experimental Algorithmics (JEA)
Directly Addressable Variable-Length Codes

SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
The web as a graph: measurements, models, and methods

COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Implicit compression boosting with applications to self-indexing

SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Clustering Based URL Normalization Technique for Web Mining

ACE '10 Proceedings of the 2010 International Conference on Advances in Computer Engineering
The compressed permuterm index

ACM Transactions on Algorithms (TALG)
Compact representation of large RDF data sets for publishing and exchange

ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
The smallest grammar problem

IEEE Transactions on Information Theory

Word-based self-indexes for natural language text

ACM Transactions on Information Systems (TOIS)
Binary RDF for scalable publishing, exchanging and consumption in the web of data

Proceedings of the 21st international conference companion on World Wide Web
Compression of RDF dictionaries

Proceedings of the 27th Annual ACM Symposium on Applied Computing
Querying RDF dictionaries in compressed space

ACM SIGAPP Applied Computing Review
Exchange and consumption of huge RDF data

ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
DACs: Bringing direct access to variable-length codes

Information Processing and Management: an International Journal
Efficient indexing algorithms for approximate pattern matching in text

Proceedings of the Seventeenth Australasian Document Computing Symposium
Space-efficient data structures for Top-k completion

Proceedings of the 22nd international conference on World Wide Web
Compact representation of Web graphs with extended functionality

Information Systems

Quantified Score

Hi-index	0.00

Visualization

Abstract

The problem of storing a set of strings - a string dictionary - in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing or for indexing text collections), recent applications inWeb engines, RDF graphs, Bioinformatics, and many others, handle very large string dictionaries, whose size is a significant fraction of the whole data. Thus efficient approaches to compress them are necessary. In this paper we empirically compare time and space performance of some existing alternatives, as well as new ones we propose. We show that space reductions of up to 20% of the original size of the strings is possible while supporting dictionary searches within a few microseconds, and up to 10% within a few tens or hundreds of microseconds.