C4.5: programs for machine learning
C4.5: programs for machine learning
The Alternating Decision Tree Learning Algorithm
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Handbook of data mining and knowledge discovery
Handbook of data mining and knowledge discovery
Grammatical bias for evolutionary learning
Grammatical bias for evolutionary learning
Predicting Students' Marks in Hellenic Open University
ICALT '05 Proceedings of the Fifth IEEE International Conference on Advanced Learning Technologies
ICMLA '06 Proceedings of the 5th International Conference on Machine Learning and Applications
Data Mining on Imbalanced Data Sets
ICACTE '08 Proceedings of the 2008 International Conference on Advanced Computer Theory and Engineering
Educational data mining: a case study for predicting dropout-prone students
International Journal of Knowledge Engineering and Soft Data Paradigms
Two decades of ripple down rules research
The Knowledge Engineering Review
Factors influencing university drop out rates
Computers & Education
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
The foundations of cost-sensitive learning
IJCAI'01 Proceedings of the 17th international joint conference on Artificial intelligence - Volume 2
A study of cross-validation and bootstrap for accuracy estimation and model selection
IJCAI'95 Proceedings of the 14th international joint conference on Artificial intelligence - Volume 2
Educational data mining: a review of the state of the art
IEEE Transactions on Systems, Man, and Cybernetics, Part C: Applications and Reviews
Data Mining: Practical Machine Learning Tools and Techniques
Data Mining: Practical Machine Learning Tools and Techniques
Predicting students' final performance from participation in on-line discussion forums
Computers & Education
Strategies for avoiding preference profiling in agent-based e-commerce environments
Applied Intelligence
Hi-index | 0.00 |
Predicting student failure at school has become a difficult challenge due to both the high number of factors that can affect the low performance of students and the imbalanced nature of these types of datasets. In this paper, a genetic programming algorithm and different data mining approaches are proposed for solving these problems using real data about 670 high school students from Zacatecas, Mexico. Firstly, we select the best attributes in order to resolve the problem of high dimensionality. Then, rebalancing of data and cost sensitive classification have been applied in order to resolve the problem of classifying imbalanced data. We also propose to use a genetic programming model versus different white box techniques in order to obtain both more comprehensible and accuracy classification rules. The outcomes of each approach are shown and compared in order to select the best to improve classification accuracy, specifically with regard to which students might fail.