Feature selection and classification model construction on type 2 diabetic patients' data

Authors:
Yue Huang;Paul McCullagh;Norman Black;Roy Harper
Affiliations:
Department of Computing, Faculty of Engineering, Imperial College London, South Kensington, London SW7 2AZ, UK;School of Computing and Mathematics, Faculty of Engineering, University of Ulster, Jordanstown BT37 0QB, UK;School of Computing and Mathematics, Faculty of Engineering, University of Ulster, Jordanstown BT37 0QB, UK;The Ulster Hospital, Dundonald, Belfast BT16 0RH, UK
Venue:
Artificial Intelligence in Medicine
Year:
2007

Citing 17
Cited 9

Instance-Based Learning Algorithms

Machine Learning
Estimating attributes: analysis and extensions of RELIEF

ECML-94 Proceedings of the European conference on machine learning on Machine Learning
A sequential algorithm for training text classifiers

SIGIR '94 Proceedings of the 17th annual international ACM SIGIR conference on Research and development in information retrieval
Advances in knowledge discovery and data mining

Advances in knowledge discovery and data mining
Data Mining: Concepts, Models, Methods and Algorithms

Data Mining: Concepts, Models, Methods and Algorithms
Data Mining: An Overview from a Database Perspective

IEEE Transactions on Knowledge and Data Engineering
Toward Optimal Active Learning through Sampling Estimation of Error Reduction

ICML '01 Proceedings of the Eighteenth International Conference on Machine Learning
Less is More: Active Learning with Support Vector Machines

ICML '00 Proceedings of the Seventeenth International Conference on Machine Learning
Feature Selection Algorithms: A Survey and Experimental Evaluation

ICDM '02 Proceedings of the 2002 IEEE International Conference on Data Mining
Error-Based Pruning of Decision Trees Grown on Very Large Data Sets Can Work!

ICTAI '02 Proceedings of the 14th IEEE International Conference on Tools with Artificial Intelligence
Theoretical and Empirical Analysis of ReliefF and RReliefF

Machine Learning
Benchmarking Attribute Selection Techniques for Discrete Class Data Mining

IEEE Transactions on Knowledge and Data Engineering
Consistency-based search in feature selection

Artificial Intelligence
Feature Selection via Supervised Model Construction

ICDM '04 Proceedings of the Fourth IEEE International Conference on Data Mining
A selective sampling approach to active feature selection

Artificial Intelligence
Gene selection by sequential search wrapper approaches in microarray cancer class prediction

Journal of Intelligent & Fuzzy Systems: Applications in Engineering and Technology - Challenges for future intelligent systems in biomedicine
Data mining for the diagnosis of type II diabetes from three-dimensional body surface anthropometrical scanning data

Computers & Mathematics with Applications

Machine learning method for knowledge discovery experimented with otoneurological data

Computer Methods and Programs in Biomedicine
Dimensionality reduction for knowledge discovery in medical claims database: Application to antidepressant medication utilization study

Computer Methods and Programs in Biomedicine
Feature selection and syndrome prediction for liver cirrhosis in traditional Chinese medicine

Computer Methods and Programs in Biomedicine
Improving Bayesian credibility intervals for classifier error rates using maximum entropy empirical priors

Artificial Intelligence in Medicine
A framework for diagnosis of urinary incontinence disease based on scoring measures and automatic classifiers

Computers in Biology and Medicine
Classification of infectious diseases based on chemiluminescent signatures of phagocytes in whole blood

Artificial Intelligence in Medicine
Self-organizing maps for translating health care knowledge: a case study in diabetes management

AI'11 Proceedings of the 24th international conference on Advances in Artificial Intelligence
Feature Based Rule Learner in Noisy Environment Using Neighbourhood Rough Set Model

International Journal of Software Science and Computational Intelligence
Review: Knowledge discovery in medicine: Current issue and future trend

Expert Systems with Applications: An International Journal

Quantified Score

Hi-index	0.00

Visualization

Abstract

Objective: Diabetes affects between 2% and 4% of the global population (up to 10% in the over 65 age group), and its avoidance and effective treatment are undoubtedly crucial public health and health economics issues in the 21st century. The aim of this research was to identify significant factors influencing diabetes control, by applying feature selection to a working patient management system to assist with ranking, classification and knowledge discovery. The classification models can be used to determine individuals in the population with poor diabetes control status based on physiological and examination factors. Methods: The diabetic patients' information was collected by Ulster Community and Hospitals Trust (UCHT) from year 2000 to 2004 as part of clinical management. In order to discover key predictors and latent knowledge, data mining techniques were applied. To improve computational efficiency, a feature selection technique, feature selection via supervised model construction (FSSMC), an optimisation of ReliefF, was used to rank the important attributes affecting diabetic control. After selecting suitable features, three complementary classification techniques (Naive Bayes, IB1 and C4.5) were applied to the data to predict how well the patients' condition was controlled. Results: FSSMC identified patients' 'age', 'diagnosis duration', the need for 'insulin treatment', 'random blood glucose' measurement and 'diet treatment' as the most important factors influencing blood glucose control. Using the reduced features, a best predictive accuracy of 95% and sensitivity of 98% was achieved. The influence of factors, such as 'type of care' delivered, the use of 'home monitoring', and the importance of 'smoking' on outcome can contribute to domain knowledge in diabetes control. Conclusion: In the care of patients with diabetes, the more important factors identified: patients' 'age', 'diagnosis duration' and 'family history', are beyond the control of physicians. Treatment methods such as 'insulin', 'diet' and 'tablets' (a variety of oral medicines) may be controlled. However lifestyle indicators such as 'body mass index' and 'smoking status' are also important and may be controlled by the patient. This further underlines the need for public health education to aid awareness and prevention. More subtle data interactions need to be better understood and data mining can contribute to the clinical evidence base. The research confirms and to a lesser extent challenges current thinking. Whilst fully appreciating the requirement for clinical verification and interpretation, this work supports the use of data mining as an exploratory tool, particularly as the domain is suffering from a data explosion due to enhanced monitoring and the (potential) storage of this data in the electronic health record. FSSMC has proved a useful feature estimator for large data sets, where processing efficiency is an important factor.