Artificial Intelligence Review - Special issue on lazy learning
On the Optimality of the Simple Bayesian Classifier under Zero-One Loss
Machine Learning - Special issue on learning with probabilistic representations
Machine Learning for the Detection of Oil Spills in Satellite Radar Images
Machine Learning - Special issue on applications of machine learning and the knowledge discovery process
Boosting and Rocchio applied to text filtering
Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval
MetaCost: a general method for making classifiers cost-sensitive
KDD '99 Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining
Robust Classification for Imprecise Environments
Machine Learning
AdaCost: Misclassification Cost-Sensitive Boosting
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Applying One-Sided Selection to Unbalanced Datasets
MICAI '00 Proceedings of the Mexican International Conference on Artificial Intelligence: Advances in Artificial Intelligence
Student modeling for a web-based learning environment: a data mining approach
Eighteenth national conference on Artificial intelligence
Personalized Courseware Construction Based on Web Data Mining
WISE '00 Proceedings of the First International Conference on Web Information Systems Engineering (WISE'00)-Volume 2 - Volume 2
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
The class imbalance problem: A systematic study
Intelligent Data Analysis
SMOTE: synthetic minority over-sampling technique
Journal of Artificial Intelligence Research
UAI'03 Proceedings of the Nineteenth conference on Uncertainty in Artificial Intelligence
Hi-index | 0.02 |
Student dropout occurs quite often in universities providing distance education and the dropout rates are definitely higher than those in conventional universities. Limiting dropout is essential in university-level distance learning and therefore the ability to predict students' dropout could be useful in a great number of different ways. Generally, data sets from this domain exhibit skewed class distributions in which most cases are allotted to the normal class (students that continue their studies) and fewer cases to the dropout class, the most interesting class. A classifier induced from an imbalanced data set has, typically, a low error rate for the majority class and an unacceptable error rate for the minority class. This paper firstly provides a systematic study on the various methodologies that have tried to handle this problem. Finally, it presents an experimental study of these methodologies with a proposed local cost sensitive technique and it concludes that such a framework can be a more effective solution to the problem.