An improved CART decision tree for datasets with irrelevant feature

  • Authors:
  • Ali Mirza Mahmood;Mohammad Imran;Naganjaneyulu Satuluri;Mrithyumjaya Rao Kuppa;Vemulakonda Rajesh

  • Affiliations:
  • Acharya Nagarjuna University, Guntur, Andhra Pradesh, India;Rayalaseema University, Kurnool, Andhra Pradesh, India;Acharya Nagarjuna University, Guntur, Andhra Pradesh, India;Vaagdevi College of Engineering, Warangal, Andhra Pradesh, India;Pursing M.Tech, MIST, Sathupalli, Khamaman District, Andhra Pradesh, India

  • Venue:
  • SEMCCO'11 Proceedings of the Second international conference on Swarm, Evolutionary, and Memetic Computing - Volume Part I
  • Year:
  • 2011

Quantified Score

Hi-index 0.00

Visualization

Abstract

Data mining tasks results are usually improved by reducing the dimensionality of data. This improvement however is achieved harder in the case that data size is moderate or huge. Although numerous algorithms for accuracy improvement have been proposed, all assume that inducing a compact and highly generalized model is difficult. In order to address above said issue, we introduce Randomized Gini Index (RGI), a novel heuristic function for dimensionality reduction, particularly applicable in large scale databases. Apart from removing irrelevant attributes, our algorithm is capable of minimizing the level of noise in the data to a greater extend which is a very attractive feature for data mining problems. We extensively evaluate its performance through experiments on both artificial and real world datasets. The outcome of the study shows the suitability and viability of our approach for knowledge discovery in moderate and large datasets.