Cost-based labeling of groups of mass spectra

Authors:
Lei Chen;Zheng Huang;Raghu Ramakrishnan
Affiliations:
University of Wisconsin, Madison, Madison, WI;University of Wisconsin, Madison, Madison, WI;University of Wisconsin, Madison, Madison, WI
Venue:
SIGMOD '04 Proceedings of the 2004 ACM SIGMOD international conference on Management of data
Year:
2004

Citing 17
Cited 3

Mining association rules between sets of items in large databases

SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases

SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A database perspective on knowledge discovery

Communications of the ACM
Entropy-based subspace clustering for mining numerical data

KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation

SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Phenomenal data mining

Communications of the ACM
Approximating block accesses in database organizations

Communications of the ACM
A Laplace transform algorithm for the volume of a convex polytope

Journal of the ACM (JACM)
A framework for data mining and KDD

Proceedings of the 2002 ACM symposium on Applied computing
Data Mining, Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management

Data Mining, Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Machine Learning

Machine Learning
A Tightly-Coupled Architecture for Data Mining

ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The 3W Model and Algebra for Unified Data Mining

VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules

VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
Clustering data streams

FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On Counting Integral Points in a Convex Rational Polytope

Mathematics of Operations Research

Sharing mass spectrometry data in a grid-based distributed proteomics laboratory

Information Processing and Management: an International Journal
Environmental chemistry through intelligent atmospheric data analysis

Environmental Modelling & Software
SpecDB: a database for storing and managing mass spectrometry proteomics data

WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications

Quantified Score

Hi-index	0.00

Visualization

Abstract

We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically for analysis of atmospheric aerosols, but with natural applications to market-basket datasets. This builds upon other recent work in which we introduced the problem of labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol Time-of-Flight Spectrometers, which are capable of generating mass spectra for hundreds of aerosol particles per minute. We also describe two algorithms for group labeling, which differ greatly in how they utilize a linear programming (LP) solver, and also differ greatly from algorithms for labeling a single spectrum.Our second contribution is to show how to automatically select between these two algorithms in a cost-based manner, analogous to how a query optimizer selects from a space of query plans. While the details are specific to the labeling problem, we believe that is a promising first step towards a general framework for cost-based data mining, and opens up an important direction for future search.