Burst tries: a fast, efficient data structure for string keys

Authors:
Steffen Heinz;Justin Zobel;Hugh E. Williams
Affiliations:
RMIT University, Melbourne, Victoria, Australia;RMIT University, Melbourne, Victoria, Australia;RMIT University, Melbourne, Victoria, Australia
Venue:
ACM Transactions on Information Systems (TOIS)
Year:
2002

Citing 45
Cited 31

Algorithms for trie compaction

ACM Transactions on Database Systems (TODS)
Amortized efficiency of list update and paging rules

Communications of the ACM
Self-adjusting binary search trees

Journal of the ACM (JACM)
Partial match retrieval of multidimensional data

Journal of the ACM (JACM)
Compilers: principles, techniques, and tools

Compilers: principles, techniques, and tools
Digital search trees revisited

SIAM Journal on Computing
Some further results on digital search trees

International Colloquium on Automata, Languages and Programming on Automata, languages and programming
Variable-depth trie index optimization: theory and experimental results

ACM Transactions on Database Systems (TODS)
Text compression

Text compression
Source models for natural language text

International Journal of Man-Machine Studies
An efficient implementation of trie structures

Software—Practice & Experience
Limiting distribution for the depth in Patricia tries

SIAM Journal on Discrete Mathematics
An evaluation of self-adjusting binary search tree techniques

Software—Practice & Experience
Improved behaviour of tries by adaptive branching

Information Processing Letters
Overview of the second text retrieval conference (TREC-2)

TREC-2 Proceedings of the second conference on Text retrieval conference
Patricia tries again revisited

Journal of the ACM (JACM)
Fast text searching for regular expressions or automaton searching on tries

Journal of the ACM (JACM)
The art of computer programming, volume 1 (3rd ed.): fundamental algorithms

The art of computer programming, volume 1 (3rd ed.): fundamental algorithms
Randomized binary search trees

Journal of the ACM (JACM)
The art of computer programming, volume 3: (2nd ed.) sorting and searching

The art of computer programming, volume 3: (2nd ed.) sorting and searching
Fast algorithms for sorting and searching strings

SODA '97 Proceedings of the eighth annual ACM-SIAM symposium on Discrete algorithms
The analysis of hybrid trie structures

Proceedings of the ninth annual ACM-SIAM symposium on Discrete algorithms
Melodic matching techniques for large music databases

MULTIMEDIA '99 Proceedings of the seventh ACM international conference on Multimedia (Part 1)
Analysis of a heuristic for full trie minimization

ACM Transactions on Database Systems (TODS)
Extendible hashing—a fast access method for dynamic files

ACM Transactions on Database Systems (TODS)
Heuristics for trie index minimization

ACM Transactions on Database Systems (TODS)
PATRICIA—Practical Algorithm To Retrieve Information Coded in Alphanumeric

Journal of the ACM (JACM)
A Space-Economical Suffix Tree Construction Algorithm

Journal of the ACM (JACM)
The Complexity of Trie Index Construction

Journal of the ACM (JACM)
Managing gigabytes (2nd ed.): compressing and indexing documents and images

Managing gigabytes (2nd ed.): compressing and indexing documents and images
Identifier Search Mechanisms: A Survey and Generalized Model

ACM Computing Surveys (CSUR)
Computer programs for detecting and correcting spelling errors

Communications of the ACM
Use of tree structures for processing files

Communications of the ACM
Trie memory

Communications of the ACM
Self-adjusting trees in practice for large text collections

Software—Practice & Experience
Average Case Analysis of Algorithms on Sequences

Average Case Analysis of Algorithms on Sequences
Handbook of Algorithms

Handbook of Algorithms
In-memory hash tables for accumulating text vocabularies

Information Processing Letters
Data Structures and Algorithms

Data Structures and Algorithms
Asymptotic Behavior of the Height in a Digital Search Tree and the Longest Phrase of the Lempel--Ziv Scheme

SIAM Journal on Computing
A Trie Compaction Algorithm for a Large Set of Keys

IEEE Transactions on Knowledge and Data Engineering
Indexing and Retrieval for Genomic Databases

IEEE Transactions on Knowledge and Data Engineering
Performance in Practice of String Hashing Functions

Proceedings of the Fifth International Conference on Database Systems for Advanced Applications (DASFAA)
Protein Is Incompressible

DCC '99 Proceedings of the Conference on Data Compression
IP-address lookup using LC-tries

IEEE Journal on Selected Areas in Communications

Performance of data structures for small sets of strings

ACSC '02 Proceedings of the twenty-fifth Australasian conference on Computer science - Volume 4
Efficient trie-based sorting of large sets of strings

ACSC '03 Proceedings of the 26th Australasian computer science conference - Volume 16
Efficient single-pass index construction for text databases

Journal of the American Society for Information Science and Technology
Cache-conscious sorting of large sets of strings with dynamic tries

Journal of Experimental Algorithmics (JEA)
Efficient online index maintenance for contiguous inverted lists

Information Processing and Management: an International Journal
On using conditional rotations and randomized heuristics for self-organizing ternary search tries

Proceedings of the 43rd annual Southeast regional conference - Volume 1
Using random sampling to build approximate tries for efficient string sorting

Journal of Experimental Algorithmics (JEA)
Cache-efficient string sorting using copying

Journal of Experimental Algorithmics (JEA)
Compression techniques for fast external sorting

The VLDB Journal — The International Journal on Very Large Data Bases
Efficient in-memory extensible inverted file

Information Systems
HAT-trie: a cache-conscious trie-based data structure for strings

ACSC '07 Proceedings of the thirtieth Australasian conference on Computer science - Volume 62
Effective phrase prediction

VLDB '07 Proceedings of the 33rd international conference on Very large data bases
An Evolutionary Perspective on Approximate RDF Query Answering

SUM '08 Proceedings of the 2nd international conference on Scalable Uncertainty Management
B-tries for disk-based string management

The VLDB Journal — The International Journal on Very Large Data Bases
Comparing integer data structures for 32- and 64-bit keys

Journal of Experimental Algorithmics (JEA)
Engineering burstsort: Toward fast in-place string sorting

Journal of Experimental Algorithmics (JEA)
BioExtract Server—An Integrated Workflow-Enabling System to Access and Analyze Heterogeneous, Distributed Biomolecular Data

IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)
Engineering burstsort: towards fast in-place string sorting

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
Comparing integer data structures for 32 and 64 bit keys

WEA'08 Proceedings of the 7th international conference on Experimental algorithms
New methods for compression of MP double array by compact management of suffixes

Information Processing and Management: an International Journal
Massive Semantic Web data compression with MapReduce

Proceedings of the 19th ACM International Symposium on High Performance Distributed Computing
Fast and compact hash tables for integer keys

ACSC '09 Proceedings of the Thirty-Second Australasian Conference on Computer Science - Volume 91
Engineering scalable, cache and space efficient tries for strings

The VLDB Journal — The International Journal on Very Large Data Bases
Redesigning the string hash table, burst trie, and BST to exploit cache

Journal of Experimental Algorithmics (JEA)
A study of transactional memory vs. locks in practice

Proceedings of the twenty-third annual ACM symposium on Parallelism in algorithms and architectures
Development of TRIP: fast sparse multivariate polynomial multiplication using burst tries

ICCS'06 Proceedings of the 6th international conference on Computational Science - Volume Part II
A fast algorithm for constructing inverted files on heterogeneous platforms

Journal of Parallel and Distributed Computing
KISS-Tree: smart latch-free in-memory indexing on modern architectures

DaMoN '12 Proceedings of the Eighth International Workshop on Data Management on New Hardware
Scalable RDF data compression with MapReduce

Concurrency and Computation: Practice & Experience
Sliced column-store (SCS): ontological foundations and practical implications

ER'12 Proceedings of the 31st international conference on Conceptual Modeling
An improved partitioning mechanism for optimizing massive data analysis using MapReduce

The Journal of Supercomputing

Quantified Score

Hi-index	0.00

Visualization

Abstract

Many applications depend on efficient management of large sets of distinct strings in memory. For example, during index construction for text databases a record is held for each distinct word in the text, containing the word itself and information such as counters. We propose a new data structure, the burst trie, that has significant advantages over existing options for such applications: it uses about the same memory as a binary search tree; it is as fast as a trie; and, while not as fast as a hash table, a burst trie maintains the strings in sorted or near-sorted order. In this paper we describe burst tries and explore the parameters that govern their performance. We experimentally determine good choices of parameters, and compare burst tries to other structures used for the same task, with a variety of data sets. These experiments show that the burst trie is particularly effective for the skewed frequency distributions common in text collections, and dramatically outperforms all other data structures for the task of managing strings while maintaining sort order.