Mining association rules between sets of items in large databases
SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data
Mining quantitative association rules in large relational tables
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
BIRCH: an efficient data clustering method for very large databases
SIGMOD '96 Proceedings of the 1996 ACM SIGMOD international conference on Management of data
A database perspective on knowledge discovery
Communications of the ACM
Entropy-based subspace clustering for mining numerical data
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Mining frequent patterns without candidate generation
SIGMOD '00 Proceedings of the 2000 ACM SIGMOD international conference on Management of data
Communications of the ACM
Approximating block accesses in database organizations
Communications of the ACM
A Laplace transform algorithm for the volume of a convex polytope
Journal of the ACM (JACM)
A framework for data mining and KDD
Proceedings of the 2002 ACM symposium on Applied computing
Data Mining, Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Data Mining, Cookbook: Modeling Data for Marketing, Risk and Customer Relationship Management
Machine Learning
A Tightly-Coupled Architecture for Data Mining
ICDE '98 Proceedings of the Fourteenth International Conference on Data Engineering
The 3W Model and Algebra for Unified Data Mining
VLDB '00 Proceedings of the 26th International Conference on Very Large Data Bases
Sampling Large Databases for Association Rules
VLDB '96 Proceedings of the 22th International Conference on Very Large Data Bases
FOCS '00 Proceedings of the 41st Annual Symposium on Foundations of Computer Science
On Counting Integral Points in a Convex Rational Polytope
Mathematics of Operations Research
Sharing mass spectrometry data in a grid-based distributed proteomics laboratory
Information Processing and Management: an International Journal
Environmental chemistry through intelligent atmospheric data analysis
Environmental Modelling & Software
SpecDB: a database for storing and managing mass spectrometry proteomics data
WILF'05 Proceedings of the 6th international conference on Fuzzy Logic and Applications
Hi-index | 0.00 |
We make two main contributions in this paper. First, we motivate and introduce a novel class of data mining problems that arise in labeling a group of mass spectra, specifically for analysis of atmospheric aerosols, but with natural applications to market-basket datasets. This builds upon other recent work in which we introduced the problem of labeling a single spectrum, and is motivated by the advent of a new generation of Aerosol Time-of-Flight Spectrometers, which are capable of generating mass spectra for hundreds of aerosol particles per minute. We also describe two algorithms for group labeling, which differ greatly in how they utilize a linear programming (LP) solver, and also differ greatly from algorithms for labeling a single spectrum.Our second contribution is to show how to automatically select between these two algorithms in a cost-based manner, analogous to how a query optimizer selects from a space of query plans. While the details are specific to the labeling problem, we believe that is a promising first step towards a general framework for cost-based data mining, and opens up an important direction for future search.