Learning, Mining, or Modeling? A Case Study from Paleocology

  • Authors:
  • Heikki Mannila;Hannu Toivonen;Atte Korhola;Heikki Olander

  • Affiliations:
  • -;-;-;-

  • Venue:
  • DS '98 Proceedings of the First International Conference on Discovery Science
  • Year:
  • 1998

Quantified Score

Hi-index 0.05

Visualization

Abstract

Exploratory data mining, machine learning, and statistical modeling all have a role in discovery science. We describe a paleoecological reconstruction problem where Bayesian methods are useful and allow plausible inferences from the small and vague data sets available. Paleoecological reconstruction aims at estimating temperatures in the past. Knowledge about present day abundances of certain species are combined with data about the same species in fossil assemblages (e.g., lake sediments). Stated formally, the reconstruction task has the form of a typical machine learning problem. However, to obtain useful predictions, a lot of background knowledge about ecological variation is needed. In paleoecological literature the statistical methods are involved variations of regression. We compare these methods with regression trees, nearest neighbor methods, and Bayesian hierarchical models. All the methods achieve about the same prediction accuracy on modern specimens, but the Bayesian methods and the involved regression methods seem to yield the best reconstructions. The advantage of the Bayesian methods is that they also give good estimates on the variability of the reconstructions.