Machine learning methods for microbial source tracking

  • Authors:
  • Lluís Belanche-Muñoz;Anicet R. Blanch

  • Affiliations:
  • Department of Software, Polytechnical University of Catalonia, Jordi Girona 1-3, Barcelona, Catalonia, Spain;Department of Microbiology, University of Barcelona, Avda. Diagonal 645, Barcelona, Spain

  • Venue:
  • Environmental Modelling & Software
  • Year:
  • 2008

Quantified Score

Hi-index 0.00

Visualization

Abstract

This paper reports on a successful application of statistical and inductive learning methods to determine optimal discriminating parameters and develop predictive models for the determination of faecal sources in waters, recently and heavily polluted with wastewaters (microbial source tracking). The data comes from an international study in which various microbial and chemical parameters were determined in heavily polluted waters from diverse geographical areas. A total of 38 variables derived from the microbial and chemical parameters were defined to characterise the available 103 observations. Four methods were evaluated: Euclidean k-nearest-neighbour, linear Bayesian classifier, quadratic Bayesian classifier and a support vector machine. The main aim was the obtention of highly accurate predictive models using the lowest number of variables possible. After a strong feature selection process, the obtained results show that predictive models using only two variables emerge with 100% correct classification. The obtained solutions make use of a linear combination of a discriminating tracer (the enumeration of phages infecting Bacteroides thetaiotaomicron) and a universal non-discriminant faecal indicator. Other models not using the discriminant tracer were developed, though a higher number of variables was needed to achieve a high rate of correct classification.