Incrementally optimized decision tree for noisy big data

  • Authors:
  • Hang Yang;Simon Fong

  • Affiliations:
  • University of Macau, Av. Padre Tomás Pereira Taipa, Macau, China;University of Macau, Av. Padre Tomás Pereira Taipa, Macau, China

  • Venue:
  • Proceedings of the 1st International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications
  • Year:
  • 2012

Quantified Score

Hi-index 0.00

Visualization

Abstract

How to extract meaningful information from big data has been a popular open problem. Decision tree, which has a high degree of knowledge interpretation, has been favored in many real world applications. However noisy values commonly exist in high-speed data streams, e.g. real-time online data feeds that are prone to interference. When processing big data, it is hard to implement pre-processing and sampling in full batches. To solve this tradeoff, this paper proposes a new incremental decision tree algorithm so called incrementally optimized very fast decision tree (iOVFDT). The experiment evaluates the proposed algorithm in comparison to existing methods under noisy data streams environment. Result shows iOVFDT has outperformance on the aspects of higher accuracy and smaller model size.