An instance-window based classification algorithm for handling gradual concept drifts

  • Authors:
  • Vahida Attar;Prashant Chaudhary;Sonali Rahagude;Gaurish Chaudhari;Pradeep Sinha

  • Affiliations:
  • College of Engineering, Pune (CoEP), Pune, India;College of Engineering, Pune (CoEP), Pune, India;College of Engineering, Pune (CoEP), Pune, India;College of Engineering, Pune (CoEP), Pune, India;Centre for Development of Advanced Computing (C-DAC), Pune, India

  • Venue:
  • ADMI'11 Proceedings of the 7th international conference on Agents and Data Mining Interaction
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Mining concept drifting data stream is a challenging area for data mining research. In real world, data streams are not stable but change with time. Such changes termed as drifts in concept of the data stream are categorized into gradual and abrupt, based on the amount of drifting time, i.e. the time steps taken to replace the old concept completely by the new one. In traditional online learning systems, this categorization has not been exploited in developing different approaches for handling different types of drifts in the data stream. Such handling of concept drifts according to their type can help improve the performance of the classification system and hence, the issue can be explored further. Among the most popular and effective approaches to handle concept drifts is ensemble learning, where a set of models built over different time periods is maintained and the predictions of models are combined, usually according to their expertise level regarding the current concept. If early instances of new concept are stored and used for ensemble learning once the drift is detected, this may help increase the overall accuracy after the drift. Moreover, if an ensemble learns with zero diversity for instances of a new concept during the drifting period, the ensemble may learn the new concept faster, thus boosting recovery. The paper presents the above mentioned approach for effective handling of gradual concept drifts in the data streams.