Likelihood-Based Data Squashing: A Modeling Approach to Instance Construction

  • Authors:
  • David Madigan;Nandini Raghavan;William Dumouchel;Martha Nason;Christian Posse;Greg Ridgeway

  • Affiliations:
  • Rutgers University. madigan@stat.rutgers.edu;AT&T Labs—Research. raghavan@research.att.com;AT&T Labs—Research. dumouchel@research.att.com;Talaria, Inc. mnason@talariainc.com;Talaria, Inc. posse@talariainc.com;University of Washington. greg@stat.washington.edu

  • Venue:
  • Data Mining and Knowledge Discovery
  • Year:
  • 2002

Quantified Score

Hi-index 0.00

Visualization

Abstract

Squashing is a lossy data compression technique that preserves statistical information. Specifically, squashing compresses a massive dataset to a much smaller one so that outputs from statistical analyses carried out on the smaller (squashed) dataset reproduce outputs from the same statistical analyses carried out on the original dataset. Likelihood-based data squashing (LDS) differs from a previously published squashing algorithm insofar as it uses a statistical model to squash the data. The results show that LDS provides excellent squashing performance even when the target statistical analysis departs from the model used to squash the data.