Cost-based labeling of groups of mass spectra

  • Authors:
  • Lei Chen;Zheng Huang;Raghu Ramakrishnan

  • Affiliations:
  • University of Wisconsin, Madison, Madison, WI;University of Wisconsin, Madison, Madison, WI;University of Wisconsin, Madison, Madison, WI

  • Venue:
  • SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
  • Year:
  • 2004

Quantified Score

Hi-index 0.00

Visualization

Abstract

We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically for analysis of atmospheric aerosols, but with natural applications to market-basket datasets. This builds upon other recent work in which we introduced the problem of labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol Time-of-Flight Spectrometers, which are capable of generating mass spectra for hundreds of aerosol particles per minute. We also describe two algorithms for group labeling, which differ greatly in how they utilize a linear programming (LP) solver, and also differ greatly from algorithms for labeling a single spectrum.Our second contribution is to show how to automatically select between these two algorithms in a cost-based manner, analogous to how a query optimizer selects from a space of query plans. While the details are specific to the labeling problem, we believe that is a promising first step towards a general framework for cost-based data mining, and opens up an important direction for future search.