Algorithms on strings, trees, and sequences: computer science and computational biology
Algorithms on strings, trees, and sequences: computer science and computational biology
The art of computer programming, volume 3: (2nd ed.) sorting and searching
The art of computer programming, volume 3: (2nd ed.) sorting and searching
Foundations of statistical natural language processing
Foundations of statistical natural language processing
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Managing gigabytes (2nd ed.): compressing and indexing documents and images
Proceedings of the 9th international World Wide Web conference on Computer networks : the international journal of computer and telecommunications netowrking
Information Retrieval: Computational and Theoretical Aspects
Information Retrieval: Computational and Theoretical Aspects
Succinct indexable dictionaries with applications to encoding k-ary trees and multisets
SODA '02 Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms
Modern Information Retrieval
Introduction to Algorithms
High-order entropy-compressed text indexes
SODA '03 Proceedings of the fourteenth annual ACM-SIAM symposium on Discrete algorithms
In-Place Calculation of Minimum-Redundancy Codes
WADS '95 Proceedings of the 4th International Workshop on Algorithms and Data Structures
Opportunistic data structures with applications
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
Compressing the Graph Structure of the Web
DCC '01 Proceedings of the Data Compression Conference
The webgraph framework I: compression techniques
Proceedings of the 13th international conference on World Wide Web
Structuring labeled trees for optimal succinctness, and beyond
FOCS '05 Proceedings of the 46th Annual IEEE Symposium on Foundations of Computer Science
ACM Computing Surveys (CSUR)
Compressed representations of sequences and full-text indexes
ACM Transactions on Algorithms (TALG)
Compressed text indexes: From theory to practice
Journal of Experimental Algorithmics (JEA)
Directly Addressable Variable-Length Codes
SPIRE '09 Proceedings of the 16th International Symposium on String Processing and Information Retrieval
The web as a graph: measurements, models, and methods
COCOON'99 Proceedings of the 5th annual international conference on Computing and combinatorics
Implicit compression boosting with applications to self-indexing
SPIRE'07 Proceedings of the 14th international conference on String processing and information retrieval
Clustering Based URL Normalization Technique for Web Mining
ACE '10 Proceedings of the 2010 International Conference on Advances in Computer Engineering
The compressed permuterm index
ACM Transactions on Algorithms (TALG)
Compact representation of large RDF data sets for publishing and exchange
ISWC'10 Proceedings of the 9th international semantic web conference on The semantic web - Volume Part I
IEEE Transactions on Information Theory
Word-based self-indexes for natural language text
ACM Transactions on Information Systems (TOIS)
Binary RDF for scalable publishing, exchanging and consumption in the web of data
Proceedings of the 21st international conference companion on World Wide Web
Compression of RDF dictionaries
Proceedings of the 27th Annual ACM Symposium on Applied Computing
Querying RDF dictionaries in compressed space
ACM SIGAPP Applied Computing Review
Exchange and consumption of huge RDF data
ESWC'12 Proceedings of the 9th international conference on The Semantic Web: research and applications
DACs: Bringing direct access to variable-length codes
Information Processing and Management: an International Journal
Efficient indexing algorithms for approximate pattern matching in text
Proceedings of the Seventeenth Australasian Document Computing Symposium
Space-efficient data structures for Top-k completion
Proceedings of the 22nd international conference on World Wide Web
Compact representation of Web graphs with extended functionality
Information Systems
Hi-index | 0.00 |
The problem of storing a set of strings - a string dictionary - in compact form appears naturally in many cases. While classically it has represented a small part of the whole data to be processed (e.g., for Natural Language processing or for indexing text collections), recent applications inWeb engines, RDF graphs, Bioinformatics, and many others, handle very large string dictionaries, whose size is a significant fraction of the whole data. Thus efficient approaches to compress them are necessary. In this paper we empirically compare time and space performance of some existing alternatives, as well as new ones we propose. We show that space reductions of up to 20% of the original size of the strings is possible while supporting dictionary searches within a few microseconds, and up to 10% within a few tens or hundreds of microseconds.