Estimating the size of the telephone universe: a Bayesian Mark-recapture approach

  • Authors:
  • David Poole

  • Affiliations:
  • AT&T Labs, Florham Park, NJ

  • Venue:
  • Proceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mark-recapture models have for many years been used to estimate the unknown sizes of animal and bird populations. In this article we adapt a finite mixture mark-recapture model in order to estimate the number of active telephone lines in the USA. The idea is to use the calling patterns of lines that are observed on the long distance network to estimate the number of lines that do not appear on the network. We present a Bayesian approach and use Markov chain Monte Carlo methods to obtain inference from the posterior distributions of the model parameters. At the state level, our results are in fairly good agreement with recent published reports on line counts. For lines that are easily classified as business or residence, the estimates have low variance. When the classification is unknown, the variability increases considerably. Results are insensitive to changes in the prior distributions. We discuss the significant computational and data mining challenges caused by the scale of the data, approximately 350 million call-detail records per day observed over a number of weeks.