Mining citizen science data to predict orevalence of wild bird species

  • Authors:
  • Rich Caruana;Mohamed Elhawary;Art Munson;Mirek Riedewald;Daria Sorokina;Daniel Fink;Wesley M. Hochachka;Steve Kelling

  • Affiliations:
  • Cornell University;Cornell University;Cornell University;Cornell University;Cornell University;Cornell Lab of Ornithology;Cornell Lab of Ornithology;Cornell Lab of Ornithology

  • Venue:
  • Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining
  • Year:
  • 2006

Quantified Score

Hi-index 0.00

Visualization

Abstract

The Cornell Laboratory of Ornithology's mission is to interpret and conserve the earth's biological diversity through research, education, and citizen science focused on birds. Over the years, the Lab has accumulated one of the largest and longest-running collections of environmental data sets in existence. The data sets are not only large, but also have many attributes, contain many missing values, and potentially are very noisy. The ecologists are interested in identifying which features have the strongest effect on the distribution and abundance of bird species as well as describing the forms of these relationships. We show how data mining can be successfully applied, enabling the ecologists to discover unanticipated relationships. We compare a variety of methods for measuring attribute importance with respect to the probability of a bird being observed at a feeder and present initial results for the impact of important attributes on bird prevalence.