C4.5: programs for machine learning
C4.5: programs for machine learning
Fundamentals of neural networks: architectures, algorithms, and applications
Fundamentals of neural networks: architectures, algorithms, and applications
The nature of statistical learning theory
The nature of statistical learning theory
The Random Subspace Method for Constructing Decision Forests
IEEE Transactions on Pattern Analysis and Machine Intelligence
Neural Networks for Pattern Recognition
Neural Networks for Pattern Recognition
Machine Learning
The Alternating Decision Tree Learning Algorithm
ICML '99 Proceedings of the Sixteenth International Conference on Machine Learning
Data Mining: Practical Machine Learning Tools and Techniques, Second Edition (Morgan Kaufmann Series in Data Management Systems)
Predicting breast cancer survivability: a comparison of three data mining methods
Artificial Intelligence in Medicine
The WEKA data mining software: an update
ACM SIGKDD Explorations Newsletter
LIBSVM: A library for support vector machines
ACM Transactions on Intelligent Systems and Technology (TIST)
Poster: A lung cancer mortality risk calculator based on SEER data
ICCABS '11 Proceedings of the 2011 IEEE 1st International Conference on Computational Advances in Bio and Medical Sciences
Medical diagnosis with C4.5 rule preceded by artificial neural network ensemble
IEEE Transactions on Information Technology in Biomedicine
Lung cancer survival prediction using ensemble data mining on SEER data
Scientific Programming - Biological Knowledge Discovery and Data Mining
Hi-index | 0.01 |
We analyze the lung cancer data available from the SEER program with the aim of developing accurate survival prediction models for lung cancer using data mining techniques. Carefully designed preprocessing steps resulted in removal/modification/splitting of several attributes, and 2 of the 11 derived attributes were found to have significant predictive power. Several data mining classification techniques were used on the preprocessed data along with various data mining optimizations and validations. In our experiments, ensemble voting of five decision tree based classifiers and meta-classifiers was found to result in the best prediction performance in terms of accuracy and area under the ROC curve. Further, we have developed an on-line lung cancer outcome calculator for estimating risk of mortality after 6 months, 9 months, 1 year, 2 year, and 5 years of diagnosis, for which a smaller non-redundant subset of 13 attributes was carefully selected using attribute selection techniques, while trying to retain the predictive power of the original set of attributes. The on-line lung cancer outcome calculator developed as a result of this study is available at http://info.eecs.northwestern.edu:8080/LungCancerOutcome-Calculator/