Succinct sampling from discrete distributions

Authors:
Karl Bringmann;Kasper Green Larsen
Affiliations:
Max-Planck-Institute for Informatics, Saarbrücken, Germany;Aarhus University, Aarhus, Denmark
Venue:
Proceedings of the forty-fifth annual ACM symposium on Theory of computing
Year:
2013

Citing 13
Cited 0

The complexity of generating an exponentially distributed variate

Journal of Algorithms
Maintaining Discrete Probability Distributions Optimally

ICALP '93 Proceedings of the 20th International Colloquium on Automata, Languages and Programming
Succinct indexable dictionaries with applications to encoding k-ary trees, prefix sums and multisets

ACM Transactions on Algorithms (TALG)
Optimal lower bounds for rank and select indexes

Theoretical Computer Science
Space-efficient static trees and graphs

SFCS '89 Proceedings of the 30th Annual Symposium on Foundations of Computer Science
On the Redundancy of Succinct Data Structures

SWAT '08 Proceedings of the 11th Scandinavian workshop on Algorithm Theory
Succincter

FOCS '08 Proceedings of the 2008 49th Annual IEEE Symposium on Foundations of Computer Science
Changing base without losing space

Proceedings of the forty-second ACM symposium on Theory of computing
Cell-probe lower bounds for succinct partial sums

SODA '10 Proceedings of the twenty-first annual ACM-SIAM symposium on Discrete Algorithms
On space efficient two dimensional range minimum data structures

ESA'10 Proceedings of the 18th annual European conference on Algorithms: Part II
On Buffon machines and numbers

Proceedings of the twenty-second annual ACM-SIAM symposium on Discrete Algorithms
Efficient sampling methods for discrete distributions

ICALP'12 Proceedings of the 39th international colloquium conference on Automata, Languages, and Programming - Volume Part I
A new succinct representation of RMQ-information and improvements in the enhanced suffix array

ESCAPE'07 Proceedings of the First international conference on Combinatorics, Algorithms, Probabilistic and Experimental Methodologies

Quantified Score

Hi-index	0.00

Visualization

Abstract

We revisit the classic problem of sampling from a discrete distribution: Given n non-negative w-bit integers x1,..,xn, the task is to build a data structure that allows sampling i with probability proportional to xi. The classic solution is Walker's alias method that takes, when implemented on a Word RAM, O(n) preprocessing time, O(1) expected query time for one sample, and n(w+2lg(n)+o(1)) bits of space. Using the terminology of succinct data structures, this solution has redundancy 2n lg(n)+o(n) bits, i.e., it uses 2n lg(n)+o(n) bits in addition to the information theoretic minimum required for storing the input. In this paper, we study whether this space usage can be improved. In the systematic case, in which the input is read-only, we present a novel data structure using r+O(w) redundant bits, O(n/r) expected query time and O(n) preprocessing time for any r. This is an improvement in redundancy by a factor of Omega(log n) over the alias method for r = n, even though the alias method is not systematic. Moreover, we complement our data structure with a lower bound showing that this trade-off is tight for systematic data structures. In the non-systematic case, in which the input numbers may be represented in more clever ways than just storing them one-by-one, we demonstrate a very surprising separation from the systematic case: With only 1 redundant bit, it is possible to support optimal O(1) expected query time and O(n) preprocessing time! On the one hand, our results improve upon the space requirement of the classic solution for a fundamental sampling problem, on the other hand, they provide the strongest known separation between the systematic and non-systematic case for any data structure problem. Finally, we also believe our upper bounds are practically efficient and simpler than Walker's alias method.