Incremental linear model trees on massive datasets: keep it simple, keep it fast

  • Authors:
  • Andreas Hapfelmeier;Jana Schmidt;Stefan Kramer

  • Affiliations:
  • Technische Universtiät München, Garching, Germany;Technische Universtiät München, Garching, Germany;Johannes Gutenberg-Universität Mainz, Mainz, Germany

  • Venue:
  • Proceedings of the 28th Annual ACM Symposium on Applied Computing
  • Year:
  • 2013

Quantified Score

Hi-index 0.00

Visualization

Abstract

The existence of massive datasets raises the need for algorithms that make efficient use of resources like memory and computation time. Besides well-known approaches such as sampling, online algorithms are being recognized as good alternatives, as they often process datasets faster using much less memory. The important class of algorithms learning linear model trees online (incremental linear model trees or ILMTs in the following) offers interesting options for regression tasks in this sense. However, surprisingly little is known about their performance, as there exists no large-scale evaluation on massive stationary datasets under equal conditions. Therefore, this paper shows their applicability on massive stationary datasets under various parameter settings. To reduce biases arising from the choice of a programming language or programming skills, all algorithms were reimplemented within the same framework and tested under the same conditions. Results on real-world datasets indicate that for massive stationary datasets parameter settings leading to complex models do not pay off, as there is at most a small accuracy gain at a much larger running time. Experimental evidence suggests that simple and fast algorithms perform best.